Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble

Article Subjects > Engineering
Subjects > Psychology
Europe University of Atlantic > Research > Scientific Production
Fundación Universitaria Internacional de Colombia > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Universidad Internacional do Cuanza > Research > Articles and books
Abierto Inglés Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances. metadata Rizwan, Muhammad and Mushtaq, Muhammad Faheem and Rafiq, Maryam and Mehmood, Arif and Diez, Isabel de la Torre and Gracia Villar, Mónica and Garay, Helena and Ashraf, Imran mail UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, monica.gracia@uneatlantico.es, helena.garay@uneatlantico.es, UNSPECIFIED (2024) Depression Intensity Classification from Tweets Using FastText Based Weighted Soft Voting Ensemble. Computers, Materials & Continua, 78 (2). pp. 2047-2066. ISSN 1546-2226

[img] Text
TSP_CMC_37347.pdf
Available under License Creative Commons Attribution.

Download (861kB)

Abstract

Predicting depression intensity from microblogs and social media posts has numerous benefits and applications, including predicting early psychological disorders and stress in individuals or the general public. A major challenge in predicting depression using social media posts is that the existing studies do not focus on predicting the intensity of depression in social media texts but rather only perform the binary classification of depression and moreover noisy data makes it difficult to predict the true depression in the social media text. This study intends to begin by collecting relevant Tweets and generating a corpus of 210000 public tweets using Twitter public application programming interfaces (APIs). A strategy is devised to filter out only depression-related tweets by creating a list of relevant hashtags to reduce noise in the corpus. Furthermore, an algorithm is developed to annotate the data into three depression classes: ‘Mild,’ ‘Moderate,’ and ‘Severe,’ based on International Classification of Diseases-10 (ICD-10) depression diagnostic criteria. Different baseline classifiers are applied to the annotated dataset to get a preliminary idea of classification performance on the corpus. Further FastText-based model is applied and fine-tuned with different preprocessing techniques and hyperparameter tuning to produce the tuned model, which significantly increases the depression classification performance to an 84% F1 score and 90% accuracy compared to baselines. Finally, a FastText-based weighted soft voting ensemble (WSVE) is proposed to boost the model’s performance by combining several other classifiers and assigning weights to individual models according to their individual performances. The proposed WSVE outperformed all baselines as well as FastText alone, with an F1 of 89%, 5% higher than FastText alone, and an accuracy of 93%, 3% higher than FastText alone. The proposed model better captures the contextual features of the relatively small sample class and aids in the detection of early depression intensity prediction from tweets with impactful performances.

Item Type: Article
Uncontrolled Keywords: Depression classification; deep learning; FastText; machine learning
Subjects: Subjects > Engineering
Subjects > Psychology
Divisions: Europe University of Atlantic > Research > Scientific Production
Fundación Universitaria Internacional de Colombia > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Universidad Internacional do Cuanza > Research > Articles and books
Date Deposited: 14 Mar 2024 23:30
Last Modified: 14 Mar 2024 23:30
URI: https://repositorio.unic.co.ao/id/eprint/11264

Actions (login required)

View Item View Item

en

close

Single-cell omics for nutrition research: an emerging opportunity for human-centric investigations

Understanding how dietary compounds affect human health is challenged by their molecular complexity and cell-type–specific effects. Conventional multi-cell type (bulk) analyses obscure cellular heterogeneity, while animal and standard in vitro models often fail to replicate human physiology. Single-cell omics technologies—such as single-cell RNA sequencing, as well as single-cell–resolved proteomic and metabolomic approaches—enable high-resolution investigation of nutrient–cell interactions and reveal mechanisms at a single-cell resolution. When combined with advanced human-derived in vitro systems like organoids and organ-on-chip platforms, they support mechanistic studies in physiologically relevant contexts. This review outlines emerging applications of single-cell omics in nutrition research, emphasizing their potential to uncover cell-specific dietary responses, identify nutrient-sensitive pathways, and capture interindividual variability. It also discusses key challenges—including technical limitations, model selection, and institutional biases—and identifies strategic directions to facilitate broader adoption in the field. Collectively, single-cell omics offer a transformative framework to advance human-centric nutrition research.

Producción Científica

Manuela Cassotta mail manucassotta@gmail.com, Yasmany Armas Diaz mail , Danila Cianciosi mail , Bei Yang mail , Zexiu Qi mail , Ge Chen mail , Santos Gracia Villar mail santos.gracia@uneatlantico.es, Luis Alonso Dzul López mail luis.dzul@uneatlantico.es, Giuseppe Grosso mail , José L. Quiles mail , Jianbo Xiao mail , Maurizio Battino mail maurizio.battino@uneatlantico.es, Francesca Giampieri mail francesca.giampieri@uneatlantico.es,

Cassotta

<a href="/17862/1/sensors-25-06419.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Edge-Based Autonomous Fire and Smoke Detection Using MobileNetV2

Forest fires pose significant threats to ecosystems, human life, and the global climate, necessitating rapid and reliable detection systems. Traditional fire detection approaches, including sensor networks, satellite monitoring, and centralized image analysis, often suffer from delayed response, high false positives, and limited deployment in remote areas. Recent deep learning-based methods offer high classification accuracy but are typically computationally intensive and unsuitable for low-power, real-time edge devices. This study presents an autonomous, edge-based forest fire and smoke detection system using a lightweight MobileNetV2 convolutional neural network. The model is trained on a balanced dataset of fire, smoke, and non-fire images and optimized for deployment on resource-constrained edge devices. The system performs near real-time inference, achieving a test accuracy of 97.98% with an average end-to-end prediction latency of 0.77 s per frame (approximately 1.3 FPS) on the Raspberry Pi 5 edge device. Predictions include the class label, confidence score, and timestamp, all generated locally without reliance on cloud connectivity, thereby enhancing security and robustness against potential cyber threats. Experimental results demonstrate that the proposed solution maintains high predictive performance comparable to state-of-the-art methods while providing efficient, offline operation suitable for real-world environmental monitoring and early wildfire mitigation. This approach enables cost-effective, scalable deployment in remote forest regions, combining accuracy, speed, and autonomous edge processing for timely fire and smoke detection.

Producción Científica

Dilshod Sharobiddinov mail , Hafeez Ur Rehman Siddiqui mail , Adil Ali Saleem mail , Gerardo Méndez Mezquita mail , Debora L. Ramírez-Vargas mail debora.ramirez@unini.edu.mx, Isabel de la Torre Díez mail ,

Sharobiddinov

<a href="/17863/1/v16p4316.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Divulging Patterns: An Analytical Review for Machine Learning Methodologies for Breast Cancer Detection

Breast cancer is a lethal carcinoma impacting a considerable number of women across the globe. While preventive measures are limited, early detection remains the most effective strategy. Accurate classification of breast tumors into benign and malignant categories is important which may help physicians in diagnosing the disease faster. This survey investigates the emerging inclination and approaches in the area of machine learning (ML) for the diagnosis of breast cancer, pointing out the classification techniques based on both segmentation and feature selection. Certain datasets such as the Wisconsin Diagnostic Breast Cancer Dataset (WDBC), Wisconsin Breast Cancer Dataset Original (WBCD), Wisconsin Prognostic Breast Cancer Dataset (WPBC), BreakHis, and others are being evaluated in this study for the demonstration of their influence on the performance of the diagnostic tools and the accuracy of the models such as Support vector machine, Convolutional Neural Networks (CNNs) and ensemble approaches. The main shortcomings or research gaps such as prejudice of datasets, scarcity of generalizability, and interpretation challenges are highlighted. This research emphasizes the importance of the hybrid methodologies, cross-dataset validation, and the engineering of explainable AI to narrow these gaps and enhance the overall clinical acceptance of ML-based detection tools.

Producción Científica

Alveena Saleem mail , Muhammad Umair mail , Muhammad Tahir Naseem mail , Muhammad Zubair mail , Silvia Aparicio Obregón mail silvia.aparicio@uneatlantico.es, Rubén Calderón Iglesias mail ruben.calderon@uneatlantico.es, Shoaib Hassan mail , Imran Ashraf mail ,

Saleem

<a href="/17871/1/ijph-70-1608318.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Unhealthy Ultra-Processed Food Consumption in Children and Adolescents Living in the Mediterranean Area: The DELICIOUS Project

Objectives: This study addressed the consumption of ultra-processed foods (UPFs) formulated with excess of energy/fats/sugars (hence deemed as unhealthy) and factors associated with it in children and adolescents living in 5 Mediterranean countries participating to the DELICIOUS (UnDErstanding consumer food choices & promotion of healthy and sustainable Mediterranean diet and LIfestyle in Children and adolescents through behavIOUral change actionS) project.Methods: A total of 2011 parents of children and adolescents (6–17 years) participated in a survey exploring their children’s frequency consumption of unhealthy UPFs and demographic, eating, and lifestyle habits.Results: Most children consumed unhealthy UPFs daily: higher intake was associated with being older and with obesity, as well as higher parental education and younger age. Children eating more frequently out of home and with a higher number of meals were also more likely to consume unhealthier UPF. Moreover, more screen time and a lower healthy lifestyle score were associated with higher unhealthy UPF consumption.Conclusion: consumption of unhealthy UPFs seems to be preeminent in children and adolescents living in the Mediterranean area and associated with an overall unhealthy lifestyle.

Producción Científica

Alice Rosi mail , Francesca Giampieri mail francesca.giampieri@uneatlantico.es, Osama Abdelkarim mail , Mohamed Aly mail , Achraf Ammar mail , Evelyn Frias-Toral mail , Juancho Pons mail , Laura Vázquez-Araújo mail , Alessandro Scuderi mail , Nunzia Decembrino mail , Alice Leonardi mail , Fernando Maniega Legarda mail , Lorenzo Monasta mail , Ana Mata mail , Adrián Chacón mail , Pablo Busó mail , Giuseppe Grosso mail ,

Rosi

<a href="/17849/1/1-s2.0-S2590005625001043-main.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Ultra Wideband radar-based gait analysis for gender classification using artificial intelligence

Gender classification plays a vital role in various applications, particularly in security and healthcare. While several biometric methods such as facial recognition, voice analysis, activity monitoring, and gait recognition are commonly used, their accuracy and reliability often suffer due to challenges like body part occlusion, high computational costs, and recognition errors. This study investigates gender classification using gait data captured by Ultra-Wideband radar, offering a non-intrusive and occlusion-resilient alternative to traditional biometric methods. A dataset comprising 163 participants was collected, and the radar signals underwent preprocessing, including clutter suppression and peak detection, to isolate meaningful gait cycles. Spectral features extracted from these cycles were transformed using a novel integration of Feedforward Artificial Neural Networks and Random Forests , enhancing discriminative power. Among the models evaluated, the Random Forest classifier demonstrated superior performance, achieving 94.68% accuracy and a cross-validation score of 0.93. The study highlights the effectiveness of Ultra-wideband radar and the proposed transformation framework in advancing robust gender classification.

Producción Científica

Adil Ali Saleem mail , Hafeez Ur Rehman Siddiqui mail , Muhammad Amjad Raza mail , Sandra Dudley mail , Julio César Martínez Espinosa mail ulio.martinez@unini.edu.mx, Luis Alonso Dzul López mail luis.dzul@uneatlantico.es, Isabel de la Torre Díez mail ,

Saleem