Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data

Artículo Materias > Ingeniería
Materias > Educación Universidad Europea del Atlántico > Investigación > Producción Científica
Universidad Internacional Iberoamericana México > Investigación > Producción Científica
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica
Universidad Internacional do Cuanza > Investigación > Artículos y libros Abierto Inglés The purpose of this research article was to contrast the benefits of the optimal probability threshold adjustment technique with other imbalanced data processing techniques, in its application to the prediction of post-graduate students’ late dropout from distance learning courses in two universities in the Ibero-American space. In this context, the optimization of the Logistic Regression, Random Forest, and Neural Network classifiers, together with different techniques, attributes, and algorithms (Hyperparameters, SMOTE, SMOTE_SVM, and ADASYN) resulted in a set of metrics for decision-making, prioritizing the reduction of false negatives. The best model was the Neural Network model in combination with SMOTE_SVM, obtaining a recall index of 0.75 and an f1-Score of 0.60. Likewise, the robustness of the Random Forest classifier for imbalanced data was demonstrated by achieving, with an optimal threshold of 0.427, very similar metrics to those obtained by the consensus of the three best models found. This demonstrates that, for Random Forest, the optimal prediction probability threshold is an excellent alternative to resampling techniques with different optimal thresholds. Finally, it is hoped that this research paper will contribute to boost the application of this simple but powerful technique, which is highly underrated with respect to data resampling techniques for imbalanced data. metadata Rodríguez Velasco, Carmen Lilí; García Villena, Eduardo; Brito Ballester, Julién; Durántez Prados, Frigdiano Álvaro; Silva Alvarado, Eduardo René y Crespo Álvarez, Jorge mail carmen.rodriguez@uneatlantico.es, eduardo.garcia@uneatlantico.es, julien.brito@uneatlantico.es, durantez@uneatlantico.es, eduardo.silva@funiber.org, jorge.crespo@uneatlantico.es (2023) Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data. International Journal of Emerging Technologies in Learning (iJET), 18 (04). pp. 120-155. ISSN 1863-0383

Texto
document.pdf
Available under License Creative Commons Attribution.
Descargar (1MB)

URL Oficial: http://doi.org/10.3991/ijet.v18i04.34825

Resumen

The purpose of this research article was to contrast the benefits of the optimal probability threshold adjustment technique with other imbalanced data processing techniques, in its application to the prediction of post-graduate students’ late dropout from distance learning courses in two universities in the Ibero-American space. In this context, the optimization of the Logistic Regression, Random Forest, and Neural Network classifiers, together with different techniques, attributes, and algorithms (Hyperparameters, SMOTE, SMOTE_SVM, and ADASYN) resulted in a set of metrics for decision-making, prioritizing the reduction of false negatives. The best model was the Neural Network model in combination with SMOTE_SVM, obtaining a recall index of 0.75 and an f1-Score of 0.60. Likewise, the robustness of the Random Forest classifier for imbalanced data was demonstrated by achieving, with an optimal threshold of 0.427, very similar metrics to those obtained by the consensus of the three best models found. This demonstrates that, for Random Forest, the optimal prediction probability threshold is an excellent alternative to resampling techniques with different optimal thresholds. Finally, it is hoped that this research paper will contribute to boost the application of this simple but powerful technique, which is highly underrated with respect to data resampling techniques for imbalanced data.

Tipo de Documento:	Artículo
Palabras Clave:	optimal likelihood threshold,, imbalanced data, student dropout prediction, resample techniques, distance learning courses
Clasificación temática:	Materias > Ingeniería Materias > Educación
Divisiones:	Universidad Europea del Atlántico > Investigación > Producción Científica Universidad Internacional Iberoamericana México > Investigación > Producción Científica Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica Universidad Internacional do Cuanza > Investigación > Artículos y libros
Depositado:	27 Feb 2023 23:30
Ultima Modificación:	21 Oct 2024 23:30
URI:	https://repositorio.unic.co.ao/id/eprint/6067

Acciones (logins necesarios)

Ver Objeto

open

Benchmarking multiple instance learning architectures from patches to pathology for prostate cancer detection and grading using attention-based weak supervision

Histopathological evaluation is necessary for the diagnosis and grading of prostate cancer, which is still one of the most common cancers in men globally. Traditional evaluation is time-consuming, prone to inter-observer variability, and challenging to scale. The clinical usefulness of current AI systems is limited by the need for comprehensive pixel-level annotations. The objective of this research is to develop and evaluate a large-scale benchmarking study on a weakly supervised deep learning framework that minimizes the need for annotation and ensures interpretability for automated prostate cancer diagnosis and International Society of Urological Pathology (ISUP) grading using whole slide images (WSIs). This study rigorously tested six cutting-edge multiple instance learning (MIL) architectures (CLAM-MB, CLAM-SB, ILRA-MIL, AC-MIL, AMD-MIL, WiKG-MIL), three feature encoders (ResNet50, CTransPath, UNI2), and four patch extraction techniques (varying sizes and overlap) using the PANDA dataset (10,616 WSIs), yielding 72 experimental configurations. The methodology used distributed cloud computing to process over 31 million tissue patches, implementing advanced attention mechanisms to ensure clinical interpretability through Grad-CAM visualizations. The optimum configuration (UNI2 encoder with ILRA-MIL, 256 256 patches, 50% overlap) achieved 78.75% accuracy and 90.12% quadratic weighted kappa (QWK), outperforming traditional methods and approaching expert pathologist-level diagnostic capability. Overlapping smaller patches offered the best balance of spatial resolution and contextual information, while domain-specific foundation models performed noticeably better than generic encoders. This work is the first large-scale, comprehensive comparison of weekly supervised MIL methods for prostate cancer diagnosis and grading. The proposed approach has excellent clinical diagnostic performance, scalability, practical feasibility through cloud computing, and interpretability using visualization tools.

Producción Científica

Naveed Anwer Butt mail , Dilawaiz Sarwat mail , Irene Delgado Noya mail irene.delgado@uneatlantico.es, Kilian Tutusaus mail kilian.tutusaus@uneatlantico.es, Nagwan Abdel Samee mail , Imran Ashraf mail ,

Butt

open

A Systematic Literature Review on Integrated Deep Learning and Multi-Agent Vision-Language Frameworks for Pathology Image Analysis and Report Generation

This systematic literature review (SLR) investigates the integration of deep learning (DL), vision-language models(VLMs), and multi-agent systems in the analysis of pathology images and automated report generation. The rapidadvancement of whole-slide imaging (WSI) technologies has posed new challenges in pathology, especially due to thescale and complexity of the data. DL techniques in general and convolutional neural networks (CNNs) and transform-ers in particular have signiﬁcantly enhanced image analysis tasks including segmentation, classiﬁcation, and detection.However, these models often lack generalizability to generate coherent, clinically relevant text, thus necessitating theintegration of VLMs and large language models (LLMs). This review examines the eﬀectiveness of VLMs and LLMsin bridging the gap between visual data and clinical text, focusing on their potential for automating the generationof pathology reports. Additionally, multi-agent systems, which leverage specialized artiﬁcial intelligence (AI) agentsto collaboratively perform diagnostic tasks, are explored for their contributions to improving diagnostic accuracy andscalability. Through a synthesis of recent studies, this review highlights the successes, challenges, and future direc-tions of these AI technologies in pathology diagnostics, oﬀering a comprehensive foundation for the development ofintegrated, AI-driven diagnostic workﬂows.

Producción Científica

Usama Ali mail , Imran Shafi mail , Jamil Ahmad mail , Arlette Zárate Cáceres mail , Thania Chio Montero mail , Hafiz Muhammad Raza ur Rehman mail , Imran Ashraf mail ,

Ali

open

Fish consumption and cognitive function in aging: a systematic review of observational studies

Epidemiological studies consistently link higher fish intake with slower rates of cognitive decline and lower dementia incidence. The aim of the present study was to systematically review existing observational studies investigating the association between fish consumption and cognitive function in older adults. A total of 25 studies (8 cross-sectional and 17 prospective including mainly healthy older adults, age range of participants ranging from 18 to 30 years at baseline in prospective studies to 65 to 91 years, representing the upper limit of the age spectrum) were reviewed. Cognitive functions currently investigated in most published studies included various domains, such as global cognition, memory (episodic, working), executive function (planning, inhibition, flexibility), attention and processing speed. Existing studies greatly vary in terms of design (cross-sectional and prospective), geographical area, number of participants involved, and tools used to assess the outcomes of interest. The main findings across studies are not univocal, with some studies reporting stronger evidence of association between fish consumption and various cognitive domains, while others addressed rather null findings. The most consistently responsive domains were processing speed, executive functioning, semantic memory, and global cognitive ability among individuals consuming fish at least weekly, which are highly relevant to both neurodegenerative and vascular forms of cognitive impairment. Positive associations were also observed for verbal memory and general memory, though these were less uniform and often attenuated after multivariable adjustment. In contrast, associations with reaction time, verbal-numerical reasoning, and broad composite scores were inconsistent, and several fully adjusted models showed null results. In conclusion, the evidence suggests that regular fish intake (typically ≥1–2 servings per week) is linked to preserved cognitive performance, although some inconsistent findings require further investigations.

Producción Científica

Justyna Godos mail , Giuseppe Caruso mail , Agnieszka Micek mail , Alberto Dolci mail , Carmen Lilí Rodríguez Velasco mail carmen.rodriguez@uneatlantico.es, Evelyn Frias-Toral mail , Jason Di Giorgio mail , Nicola Veronese mail , Andrea Lehoczki mail , Mario Siervo mail , Zoltan Ungvari mail , Giuseppe Grosso mail ,

Godos

open

Inflammatory potential of the diet and self-rated quality of life in Italian adults

Background: Dietary quality is widely acknowledged as a key factor in maintaining good health. Recommendations that promote plant-based eating patterns are largely grounded in evidence showing that dietary choices can modulate the immune function. In line with such a hypothesis, diet may be considered as a potential driver of persistent low-grade inflammation. Quality of life (QoL), on the other hand, serves as a broad indicator that encompasses both physical and psychological wellbeing.Aim: The purpose of this cross-sectional study was to examine the relationship between the inflammatory potential of the diet and QoL in a population sample of Italian adults.Design: A total of 1,936 participants completed a 110-item food frequency questionnaire to assess eating habits. The inflammatory potential of their diet was calculated using the dietary inflammatory score (DIS). Quality of life was measured with the Manchester Short Appraisal (MANSA).Results: Higher DIS values, reflecting a more pro-inflammatory diet, were linked to reduced likelihood of reporting high QoL (OR = 0.56; 95% CI: 0.40–0.78). Several specific domains of QoL, including general life satisfaction, social relationships, personal safety, satisfaction with cohabitation, physical health, and mental health, also showed significant associations with DIS.Conclusion: The findings suggest an association between the inflammatory potential of the diet and QoL.

Producción Científica

Francesca Giampieri mail francesca.giampieri@uneatlantico.es, Justyna Godos mail , Giuseppe Caruso mail , Marco Antonio Olvera-Moreira mail , Fabrizio Furnari mail , Andrea Di Mauro mail , Irma Dominguez Azpíroz mail irma.dominguez@unini.edu.mx, Raynier Zambrano-Villacres mail , Evelyn Frias-Toral mail , Fabio Galvano mail , Giuseppe Grosso mail ,

Giampieri

open

Human Activity Recognition in Domestic Settings Based on Optical Techniques and Ensemble Models

Human activity recognition (HAR) is essential in many applications, such as smart homes, assisted living, healthcare monitoring, rehabilitation, physiotherapy, and geriatric care. Conventional methods of HAR use wearable sensors, e.g., acceleration sensors and gyroscopes. However, they are limited by issues such as sensitivity to position, user inconvenience, and potential health risks with long-term use. Optical camera systems that are vision-based provide an alternative that is not intrusive; however, they are susceptible to variations in lighting, intrusions, and privacy issues. The paper uses an optical method of recognizing human domestic activities based on pose estimation and deep learning ensemble models. The skeletal keypoint features proposed in the current methodology are extracted from video data using PoseNet to generate a privacy-preserving representation that captures key motion dynamics without being sensitive to changes in appearance. A total of 30 subjects (15 male and 15 female) were sampled across 2734 activity samples, including nine daily domestic activities. There were six deep learning architectures, namely, the Transformer (Transformer), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Multilayer Perceptron (MLP), One-Dimensional Convolutional Neural Network (1D CNN), and a hybrid Convolutional Neural Network–Long Short-Term Memory (CNN–LSTM) architecture. The results on the hold-out test set show that the CNN–LSTM architecture achieves an accuracy of 98.78% within our experimental setting. Leave-One-Subject-Out cross-validation further confirms robust generalization across unseen individuals, with CNN–LSTM achieving a mean accuracy of 97.21% ± 1.84% across 30 subjects. The results demonstrate that vision-based pose estimation with deep learning is a useful, precise, and non-intrusive approach to HAR in smart healthcare and home automation systems.

Producción Científica

Muhammad Amjad Raza mail , Nasir Mehmood mail , Hafeez Ur Rehman Siddiqui mail , Adil Ali Saleem mail , Roberto Marcelo Álvarez mail roberto.alvarez@uneatlantico.es, Yini Airet Miró Vera mail yini.miro@uneatlantico.es, Isabel de la Torre Díez mail ,

Raza

Enlaces de interesse

Enlaces de interesses

Forecasting of Post-Graduate Students’ Late Dropout Based on the Optimal Probability Threshold Adjustment Technique for Imbalanced Data

Resumen

Acciones (logins necesarios)

ÁREA DE CONHECIMENTO

ACESSO

Língua

Filtro