Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data

Artículo Materias > Ingeniería
Materias > Educación
Universidad Europea del Atlántico > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Artículos y libros
Abierto Inglés The purpose of this research article was to contrast the benefits of the optimal probability threshold adjustment technique with other imbalanced data processing techniques, in its application to the prediction of post-graduate students’ late dropout from distance learning courses in two universities in the Ibero-American space. In this context, the optimization of the Logistic Regression, Random Forest, and Neural Network classifiers, together with different techniques, attributes, and algorithms (Hyperparameters, SMOTE, SMOTE_SVM, and ADASYN) resulted in a set of metrics for decision-making, prioritizing the reduction of false negatives. The best model was the Neural Network model in combination with SMOTE_SVM, obtaining a recall index of 0.75 and an f1-Score of 0.60. Likewise, the robustness of the Random Forest classifier for imbalanced data was demonstrated by achieving, with an optimal threshold of 0.427, very similar metrics to those obtained by the consensus of the three best models found. This demonstrates that, for Random Forest, the optimal prediction probability threshold is an excellent alternative to resampling techniques with different optimal thresholds. Finally, it is hoped that this research paper will contribute to boost the application of this simple but powerful technique, which is highly underrated with respect to data resampling techniques for imbalanced data. metadata Rodríguez Velasco, Carmen Lilí; García Villena, Eduardo; Brito Ballester, Julién; Durántez Prados, Frigdiano Álvaro; Silva Alvarado, Eduardo René y Crespo Álvarez, Jorge mail carmen.rodriguez@uneatlantico.es, eduardo.garcia@uneatlantico.es, julien.brito@uneatlantico.es, durantez@uneatlantico.es, eduardo.silva@funiber.org, jorge.crespo@uneatlantico.es (2023) Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data. International Journal of Emerging Technologies in Learning (iJET), 18 (04). pp. 120-155. ISSN 1863-0383

[img] Texto
document.pdf
Available under License Creative Commons Attribution.

Descargar (1MB)

Resumen

The purpose of this research article was to contrast the benefits of the optimal probability threshold adjustment technique with other imbalanced data processing techniques, in its application to the prediction of post-graduate students’ late dropout from distance learning courses in two universities in the Ibero-American space. In this context, the optimization of the Logistic Regression, Random Forest, and Neural Network classifiers, together with different techniques, attributes, and algorithms (Hyperparameters, SMOTE, SMOTE_SVM, and ADASYN) resulted in a set of metrics for decision-making, prioritizing the reduction of false negatives. The best model was the Neural Network model in combination with SMOTE_SVM, obtaining a recall index of 0.75 and an f1-Score of 0.60. Likewise, the robustness of the Random Forest classifier for imbalanced data was demonstrated by achieving, with an optimal threshold of 0.427, very similar metrics to those obtained by the consensus of the three best models found. This demonstrates that, for Random Forest, the optimal prediction probability threshold is an excellent alternative to resampling techniques with different optimal thresholds. Finally, it is hoped that this research paper will contribute to boost the application of this simple but powerful technique, which is highly underrated with respect to data resampling techniques for imbalanced data.

Tipo de Documento: Artículo
Palabras Clave: optimal likelihood threshold,, imbalanced data, student dropout prediction, resample techniques, distance learning courses
Clasificación temática: Materias > Ingeniería
Materias > Educación
Divisiones: Universidad Europea del Atlántico > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Artículos y libros
Depositado: 27 Feb 2023 23:30
Ultima Modificación: 21 Oct 2024 23:30
URI: https://repositorio.unic.co.ao/id/eprint/6067

Acciones (logins necesarios)

Ver Objeto Ver Objeto

<a class="ep_document_link" href="/10290/1/Influence%20of%20E-learning%20training%20on%20the%20acquisition%20of%20competences%20in%20basketball%20coaches%20in%20Cantabria.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Influence of E-learning training on the acquisition of competences in basketball coaches in Cantabria

The main aim of this study was to analyse the influence of e-learning training on the acquisition of competences in basketball coaches in Cantabria. The current landscape of basketball coach training shows an increasing demand for innovative training models and emerging pedagogies, including e-learning-based methodologies. The study sample consisted of fifty students from these courses, all above 16 years of age (36 males, 14 females). Among them, 16% resided outside the autonomous community of Cantabria, 10% resided more than 50 km from the city of Santander, 36% between 10 and 50 km, 14% less than 10 km, and 24% resided within Santander city. Data were collected through a Google Forms survey distributed by the Cantabrian Basketball Federation to training course students. Participation was voluntary and anonymous. The survey, consisting of 56 questions, was validated by two sports and health doctors and two senior basketball coaches. The collected data were processed and analysed using Microsoft® Excel version 16.74, and the results were expressed in percentages. The analysis revealed that 24.60% of the students trained through the e-learning methodology considered themselves fully qualified as basketball coaches, contrasting with 10.98% of those trained via traditional face-to-face methodology. The results of the study provide insights into important characteristics that can be adjusted and improved within the investigated educational process. Moreover, the study concludes that e-learning training effectively qualifies basketball coaches in Cantabria.

Producción Científica

Josep Alemany Iturriaga mail josep.alemany@uneatlantico.es, Álvaro Velarde-Sotres mail alvaro.velarde@uneatlantico.es, Javier Jorge mail , Kamil Giglio mail ,

Alemany Iturriaga

<a class="ep_document_link" href="/15625/1/s41598-024-74127-8.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Smart agriculture: utilizing machine learning and deep learning for drought stress identification in crops

Plant stress reduction research has advanced significantly with the use of Artificial Intelligence (AI) techniques, such as machine learning and deep learning. This is a significant step toward sustainable agriculture. Innovative insights into the physiological responses of plants mostly crops to drought stress have been revealed through the use of complex algorithms like gradient boosting, support vector machines (SVM), recurrent neural network (RNN), and long short-term memory (LSTM), combined with a thorough examination of the TYRKC and RBR-E3 domains in stress-associated signaling proteins across a range of crop species. Modern resources were used in this study, including the UniProt protein database for crop physiochemical properties associated with specific signaling domains and the SMART database for signaling protein domains. These insights were then applied to deep learning and machine learning techniques after careful data processing. The rigorous metric evaluations and ablation analysis that typified the study’s approach highlighted the algorithms’ effectiveness and dependability in recognizing and classifying stress events. Notably, the accuracy of SVM was 82%, while gradient boosting and RNN showed 96%, and 94%, respectively and LSTM obtained an astounding 97% accuracy. The study observed these successes but also highlights the ongoing obstacles to AI adoption in agriculture, emphasizing the need for creative thinking and interdisciplinary cooperation. In addition to its scholarly value, the collected data has significant implications for improving resource efficiency, directing precision agricultural methods, and supporting global food security programs. Notably, the gradient boosting and LSTM algorithm outperformed the others with an exceptional accuracy of 96% and 97%, demonstrating their potential for accurate stress categorization. This work highlights the revolutionary potential of AI to completely disrupt the agricultural industry while simultaneously advancing our understanding of plant stress responses.

Producción Científica

Tariq Ali mail , Saif Ur Rehman mail , Shamshair Ali mail , Khalid Mahmood mail , Silvia Aparicio Obregón mail silvia.aparicio@uneatlantico.es, Rubén Calderón Iglesias mail ruben.calderon@uneatlantico.es, Tahir Khurshaid mail , Imran Ashraf mail ,

Ali

<a class="ep_document_link" href="/15198/1/nutrients-16-03859.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Carotenoids Intake and Cardiovascular Prevention: A Systematic Review

Background: Cardiovascular diseases (CVDs) encompass a variety of conditions that affect the heart and blood vessels. Carotenoids, a group of fat-soluble organic pigments synthesized by plants, fungi, algae, and some bacteria, may have a beneficial effect in reducing cardiovascular disease (CVD) risk. This study aims to examine and synthesize current research on the relationship between carotenoids and CVDs. Methods: A systematic review was conducted using MEDLINE and the Cochrane Library to identify relevant studies on the efficacy of carotenoid supplementation for CVD prevention. Interventional analytical studies (randomized and non-randomized clinical trials) published in English from January 2011 to February 2024 were included. Results: A total of 38 studies were included in the qualitative analysis. Of these, 17 epidemiological studies assessed the relationship between carotenoids and CVDs, 9 examined the effect of carotenoid supplementation, and 12 evaluated dietary interventions. Conclusions: Elevated serum carotenoid levels are associated with reduced CVD risk factors and inflammatory markers. Increasing the consumption of carotenoid-rich foods appears to be more effective than supplementation, though the specific effects of individual carotenoids on CVD risk remain uncertain.

Producción Científica

Sandra Sumalla Cano mail sandra.sumalla@uneatlantico.es, Imanol Eguren García mail imanol.eguren@uneatlantico.es, Álvaro Lasarte García mail , Thomas Prola mail thomas.prola@uneatlantico.es, Raquel Martínez Díaz mail raquel.martinez@uneatlantico.es, Iñaki Elío Pascual mail inaki.elio@uneatlantico.es,

Sumalla Cano

<a href="/15444/1/s41598-024-79106-7.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization

With the rapid increase of users over social media, cyberbullying, and hate speech problems have arisen over the past years. Automatic hate speech detection (HSD) from text is an emerging research problem in natural language processing (NLP). Researchers developed various approaches to solve the automatic hate speech detection problem using different corpora in various languages, however, research on the Urdu language is rather scarce. This study aims to address the HSD task on Twitter using Roman Urdu text. The contribution of this research is the development of a hybrid model for Roman Urdu HSD, which has not been previously explored. The novel hybrid model integrates deep learning (DL) and transformer models for automatic feature extraction, combined with machine learning algorithms (MLAs) for classification. To further enhance model performance, we employ several hyperparameter optimization (HPO) techniques, including Grid Search (GS), Randomized Search (RS), and Bayesian Optimization with Gaussian Processes (BOGP). Evaluation is carried out on two publicly available benchmarks Roman Urdu corpora comprising HS-RU-20 corpus and RUHSOLD hate speech corpus. Results demonstrate that the Multilingual BERT (MBERT) feature learner, paired with a Support Vector Machine (SVM) classifier and optimized using RS, achieves state-of-the-art performance. On the HS-RU-20 corpus, this model attained an accuracy of 0.93 and an F1 score of 0.95 for the Neutral-Hostile classification task, and an accuracy of 0.89 with an F1 score of 0.88 for the Hate Speech-Offensive task. On the RUHSOLD corpus, the same model achieved an accuracy of 0.95 and an F1 score of 0.94 for the Coarse-grained task, alongside an accuracy of 0.87 and an F1 score of 0.84 for the Fine-grained task. These results demonstrate the effectiveness of our hybrid approach for Roman Urdu hate speech detection.

Producción Científica

Waqar Ashiq mail , Samra Kanwal mail , Adnan Rafique mail , Muhammad Waqas mail , Tahir Khurshaid mail , Elizabeth Caro Montero mail elizabeth.caro@uneatlantico.es, Alicia Bustamante Alonso mail alicia.bustamante@uneatlantico.es, Imran Ashraf mail ,

Ashiq

<a class="ep_document_link" href="/15623/1/s12880-024-01498-9.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Virtual histopathology methods in medical imaging - a systematic review

Virtual histopathology is an emerging technology in medical imaging that utilizes advanced computational methods to analyze tissue images for more precise disease diagnosis. Traditionally, histopathology relies on manual techniques and expertise, often resulting in time-consuming processes and variability in diagnoses. Virtual histopathology offers a more consistent, and automated approach, employing techniques like machine learning, deep learning, and image processing to simulate staining and enhance tissue analysis. This review explores the strengths, limitations, and clinical applications of these methods, highlighting recent advancements in virtual histopathological approaches. In addition, important areas are identified for future research to improve diagnostic accuracy and efficiency in clinical settings.

Producción Científica

Muhammad Talha Imran mail , Imran Shafi mail , Jamil Ahmad mail , Muhammad Fasih Uddin Butt mail , Santos Gracia Villar mail santos.gracia@uneatlantico.es, Eduardo García Villena mail eduardo.garcia@uneatlantico.es, Tahir Khurshaid mail , Imran Ashraf mail ,

Imran