Feature group partitioning: an approach for depression severity prediction with class balancing using machine learning algorithms
Artículo
Materias > Ingeniería
Universidad Europea del Atlántico > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Artículos y libros
Universidad de La Romana > Investigación > Producción Científica
Abierto
Inglés
In contemporary society, depression has emerged as a prominent mental disorder that exhibits exponential growth and exerts a substantial influence on premature mortality. Although numerous research applied machine learning methods to forecast signs of depression. Nevertheless, only a limited number of research have taken into account the severity level as a multiclass variable. Besides, maintaining the equality of data distribution among all the classes rarely happens in practical communities. So, the inevitable class imbalance for multiple variables is considered a substantial challenge in this domain. Furthermore, this research emphasizes the significance of addressing class imbalance issues in the context of multiple classes. We introduced a new approach Feature group partitioning (FGP) in the data preprocessing phase which effectively reduces the dimensionality of features to a minimum. This study utilized synthetic oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN), for class balancing. The dataset used in this research was collected from university students by administering the Burn Depression Checklist (BDC). For methodological modifications, we implemented heterogeneous ensemble learning stacking, homogeneous ensemble bagging, and five distinct supervised machine learning algorithms. The issue of overfitting was mitigated by evaluating the accuracy of the training, validation, and testing datasets. To justify the effectiveness of the prediction models, balanced accuracy, sensitivity, specificity, precision, and f1-score indices are used. Overall, comprehensive analysis demonstrates the discrimination between the Conventional Depression Screening (CDS) and FGP approach. In summary, the results show that the stacking classifier for FGP with SMOTE approach yields the highest balanced accuracy, with a rate of 92.81%. The empirical evidence has demonstrated that the FGP approach, when combined with the SMOTE, able to produce better performance in predicting the severity of depression. Most importantly the optimization of the training time of the FGP approach for all of the classifiers is a significant achievement of this research.
metadata
Shaha, Tumpa Rani; Begum, Momotaz; Uddin, Jia; Yélamos Torres, Vanessa; Alemany Iturriaga, Josep; Ashraf, Imran y Samad, Md. Abdus
mail
SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, vanessa.yelamos@funiber.org, josep.alemany@uneatlantico.es, SIN ESPECIFICAR, SIN ESPECIFICAR
(2024)
Feature group partitioning: an approach for depression severity prediction with class balancing using machine learning algorithms.
BMC Medical Research Methodology, 24 (1).
ISSN 1471-2288
Texto
s12874-024-02249-8.pdf Available under License Creative Commons Attribution. Descargar (2MB) |
Resumen
In contemporary society, depression has emerged as a prominent mental disorder that exhibits exponential growth and exerts a substantial influence on premature mortality. Although numerous research applied machine learning methods to forecast signs of depression. Nevertheless, only a limited number of research have taken into account the severity level as a multiclass variable. Besides, maintaining the equality of data distribution among all the classes rarely happens in practical communities. So, the inevitable class imbalance for multiple variables is considered a substantial challenge in this domain. Furthermore, this research emphasizes the significance of addressing class imbalance issues in the context of multiple classes. We introduced a new approach Feature group partitioning (FGP) in the data preprocessing phase which effectively reduces the dimensionality of features to a minimum. This study utilized synthetic oversampling techniques, specifically Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN), for class balancing. The dataset used in this research was collected from university students by administering the Burn Depression Checklist (BDC). For methodological modifications, we implemented heterogeneous ensemble learning stacking, homogeneous ensemble bagging, and five distinct supervised machine learning algorithms. The issue of overfitting was mitigated by evaluating the accuracy of the training, validation, and testing datasets. To justify the effectiveness of the prediction models, balanced accuracy, sensitivity, specificity, precision, and f1-score indices are used. Overall, comprehensive analysis demonstrates the discrimination between the Conventional Depression Screening (CDS) and FGP approach. In summary, the results show that the stacking classifier for FGP with SMOTE approach yields the highest balanced accuracy, with a rate of 92.81%. The empirical evidence has demonstrated that the FGP approach, when combined with the SMOTE, able to produce better performance in predicting the severity of depression. Most importantly the optimization of the training time of the FGP approach for all of the classifiers is a significant achievement of this research.
Tipo de Documento: | Artículo |
---|---|
Palabras Clave: | Machine learning; Depression prediction; Class balancing; Oversampling; SMOTE; ADASYN; Stratified cross validation; Burn depression checklist; Feature group partitioning |
Clasificación temática: | Materias > Ingeniería |
Divisiones: | Universidad Europea del Atlántico > Investigación > Producción Científica Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Artículos y libros Universidad de La Romana > Investigación > Producción Científica |
Depositado: | 17 Jun 2024 23:30 |
Ultima Modificación: | 17 Jun 2024 23:30 |
URI: | https://repositorio.unic.co.ao/id/eprint/12751 |
Acciones (logins necesarios)
Ver Objeto |
<a href="/10290/1/Influence%20of%20E-learning%20training%20on%20the%20acquisition%20of%20competences%20in%20basketball%20coaches%20in%20Cantabria.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
The main aim of this study was to analyse the influence of e-learning training on the acquisition of competences in basketball coaches in Cantabria. The current landscape of basketball coach training shows an increasing demand for innovative training models and emerging pedagogies, including e-learning-based methodologies. The study sample consisted of fifty students from these courses, all above 16 years of age (36 males, 14 females). Among them, 16% resided outside the autonomous community of Cantabria, 10% resided more than 50 km from the city of Santander, 36% between 10 and 50 km, 14% less than 10 km, and 24% resided within Santander city. Data were collected through a Google Forms survey distributed by the Cantabrian Basketball Federation to training course students. Participation was voluntary and anonymous. The survey, consisting of 56 questions, was validated by two sports and health doctors and two senior basketball coaches. The collected data were processed and analysed using Microsoft® Excel version 16.74, and the results were expressed in percentages. The analysis revealed that 24.60% of the students trained through the e-learning methodology considered themselves fully qualified as basketball coaches, contrasting with 10.98% of those trained via traditional face-to-face methodology. The results of the study provide insights into important characteristics that can be adjusted and improved within the investigated educational process. Moreover, the study concludes that e-learning training effectively qualifies basketball coaches in Cantabria.
Josep Alemany Iturriaga mail josep.alemany@uneatlantico.es, Álvaro Velarde-Sotres mail alvaro.velarde@uneatlantico.es, Javier Jorge mail , Kamil Giglio mail ,
Alemany Iturriaga
<a href="/15198/1/nutrients-16-03859.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Carotenoids Intake and Cardiovascular Prevention: A Systematic Review
Background: Cardiovascular diseases (CVDs) encompass a variety of conditions that affect the heart and blood vessels. Carotenoids, a group of fat-soluble organic pigments synthesized by plants, fungi, algae, and some bacteria, may have a beneficial effect in reducing cardiovascular disease (CVD) risk. This study aims to examine and synthesize current research on the relationship between carotenoids and CVDs. Methods: A systematic review was conducted using MEDLINE and the Cochrane Library to identify relevant studies on the efficacy of carotenoid supplementation for CVD prevention. Interventional analytical studies (randomized and non-randomized clinical trials) published in English from January 2011 to February 2024 were included. Results: A total of 38 studies were included in the qualitative analysis. Of these, 17 epidemiological studies assessed the relationship between carotenoids and CVDs, 9 examined the effect of carotenoid supplementation, and 12 evaluated dietary interventions. Conclusions: Elevated serum carotenoid levels are associated with reduced CVD risk factors and inflammatory markers. Increasing the consumption of carotenoid-rich foods appears to be more effective than supplementation, though the specific effects of individual carotenoids on CVD risk remain uncertain.
Sandra Sumalla Cano mail sandra.sumalla@uneatlantico.es, Imanol Eguren García mail imanol.eguren@uneatlantico.es, Álvaro Lasarte García mail , Thomas Prola mail thomas.prola@uneatlantico.es, Raquel Martínez Díaz mail raquel.martinez@uneatlantico.es, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es,
Sumalla Cano
<a class="ep_document_link" href="/14584/1/s41598-024-73664-6.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
The evolution of the COVID-19 pandemic has been associated with variations in clinical presentation and severity. Similarly, prediction scores may suffer changes in their diagnostic accuracy. The aim of this study was to test the 30-day mortality predictive validity of the 4C and SEIMC scores during the sixth wave of the pandemic and to compare them with those of validation studies. This was a longitudinal retrospective observational study. COVID-19 patients who were admitted to the Emergency Department of a Spanish hospital from December 15, 2021, to January 31, 2022, were selected. A side-by-side comparison with the pivotal validation studies was subsequently performed. The main measures were 30-day mortality and the 4C and SEIMC scores. A total of 27,614 patients were considered in the study, including 22,361 from the 4C, 4,627 from the SEIMC and 626 from our hospital. The 30-day mortality rate was significantly lower than that reported in the validation studies. The AUCs were 0.931 (95% CI: 0.90–0.95) for 4C and 0.903 (95% CI: 086–0.93) for SEIMC, which were significantly greater than those obtained in the first wave. Despite the changes that have occurred during the coronavirus disease 2019 (COVID-19) pandemic, with a reduction in lethality, scorecard systems are currently still useful tools for detecting patients with poor disease risk, with better prognostic capacity.
Pedro Ángel de Santos Castro mail , Carlos del Pozo Vegas mail , Leyre Teresa Pinilla Arribas mail , Daniel Zalama Sánchez mail , Ancor Sanz-García mail , Tony Giancarlo Vásquez del Águila mail , Pablo González Izquierdo mail , Sara de Santos Sánchez mail , Cristina Mazas Pérez-Oleaga mail cristina.mazas@uneatlantico.es, Irma Dominguez Azpíroz mail irma.dominguez@unini.edu.mx, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Francisco Martín-Rodríguez mail ,
de Santos Castro
<a class="ep_document_link" href="/14915/1/s41598-024-74357-w.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
Diabetes is a persistent health condition led by insufficient use or inappropriate use of insulin in the body. If left undetected, it can lead to further complications involving organ damage such as heart, lungs, and eyes. Timely detection of diabetes helps obtain the right medication, diet, and exercise plan to lead a healthy life. ML approach has been utilized to obtain rapid and reliable diabetes detection, however, existing approaches suffer from the use of limited datasets, lack of generalizability, and lower accuracy. This study proposes a novel feature extraction approach to overcome these limitations by using an ensemble of convolutional neural network (CNN) and long short-term memory (LSTM) models. Multiple datasets are combined to make a larger dataset for experiments and multiple features are utilized for investigating the efficacy of the proposed approach. Features from the extra tree classifier, CNN, and LSTM are also considered for comparison. Experimental results reveal the superb performance of CNN-LSTM-based features with random forest model obtaining a 0.99 accuracy score. This performance is further validated by comparison with existing approaches and k-fold cross-validation which shows the proposed approach provides robust results.
Furqan Rustam mail , Ahmad Sami Al-Shamayleh mail , Rahman Shafique mail , Silvia Aparicio Obregón mail silvia.aparicio@uneatlantico.es, Rubén Calderón Iglesias mail ruben.calderon@uneatlantico.es, J. Pablo Miramontes Gonzalez mail , Imran Ashraf mail ,
Rustam
<a href="/14934/1/s41598-024-77427-1.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>
en
open
A deep learning approach to optimize remaining useful life prediction for Li-ion batteries
Accurately predicting the remaining useful life (RUL) of lithium-ion (Li-ion) batteries is vital for improving battery performance and safety in applications such as consumer electronics and electric vehicles. While the prediction of RUL for these batteries is a well-established field, the current research refines RUL prediction methodologies by leveraging deep learning techniques, advancing prediction accuracy. This study proposes AccuCell Prodigy, a deep learning model that integrates auto-encoders and long short-term memory (LSTM) layers to enhance RUL prediction accuracy and efficiency. The model’s name reflects its precision (“AccuCell”) and predictive strength (“Prodigy”). The proposed methodology involves preparing a dataset of battery operational features, split using an 80–20 ratio for training and testing. Leveraging 22 variations of current (critical parameter) across three Li-ion cells, AccuCell Prodigy significantly reduces prediction errors, achieving a mean square error of 0.1305%, mean absolute error of 2.484%, and root mean square error of 3.613%, with a high R-squared value of 0.9849. These results highlight its robustness and potential for advancing battery health management.
Mahrukh Iftikhar mail , Muhammad Shoaib mail , Ayesha Altaf mail , Faiza Iqbal mail , Santos Gracia Villar mail santos.gracia@uneatlantico.es, Luis Alonso Dzul López mail luis.dzul@uneatlantico.es, Imran Ashraf mail ,
Iftikhar