eprintid: 4194
rev_number: 11
eprint_status: archive
userid: 2
dir: disk0/00/00/41/94
datestamp: 2022-10-26 23:30:04
lastmod: 2023-07-18 23:30:12
status_changed: 2022-10-26 23:30:04
type: article
metadata_visibility: show
creators_name: Mehmood, Aneela
creators_name: Farooq, Muhammad Shoaib
creators_name: Naseem, Ansar
creators_name: Rustam, Furqan
creators_name: Gracia Villar, Mónica
creators_name: Rodríguez Velasco, Carmen Lilí
creators_name: Ashraf, Imran
creators_id: 
creators_id: 
creators_id: 
creators_id: 
creators_id: monica.gracia@uneatlantico.es
creators_id: carmen.rodriguez@uneatlantico.es
creators_id: 
title: Threatening URDU Language Detection from Tweets Using Machine Learning
ispublished: pub
subjects: uneat_eng
divisions: uneatlantico_produccion_cientifica
divisions: unincol_produccion_cientifica
divisions: uninimx_produccion_cientifica
divisions: uninipr_produccion_cientifica
divisions: unic_produccion_cientifica
full_text_status: public
keywords: threatening language detection; Urdu text classification; machine learning; stacking
abstract: Technology’s expansion has contributed to the rise in popularity of social media platforms. Twitter is one of the leading social media platforms that people use to share their opinions. Such opinions, sometimes, may contain threatening text, deliberately or non-deliberately, which can be disturbing for other users. Consequently, the detection of threatening content on social media is an important task. Contrary to high-resource languages like English, Dutch, and others that have several such approaches, the low-resource Urdu language does not have such a luxury. Therefore, this study presents an intelligent threatening language detection for the Urdu language. A stacking model is proposed that uses an extra tree (ET) classifier and Bayes theorem-based Bernoulli Naive Bayes (BNB) as the based learners while logistic regression (LR) is employed as the meta learner. A performance analysis is carried out by deploying a support vector classifier, ET, LR, BNB, fully connected network, convolutional neural network, long short-term memory, and gated recurrent unit. Experimental results indicate that the stacked model performs better than both machine learning and deep learning models. With 74.01% accuracy, 70.84% precision, 75.65% recall, and 73.99% F1 score, the model outperforms the existing benchmark study.
date: 2022-10
publication: Applied Sciences
volume: 12
number: 20
pagerange: 10342
id_number: doi:10.3390/app122010342
refereed: TRUE
issn: 2076-3417
official_url: http://doi.org/10.3390/app122010342
access: open
language: en
citation:   Artículo Materias > Ingeniería <http://repositorio.unic.co.ao/view/subjects/uneat=5Feng.html> Universidad Europea del Atlántico > Investigación > Producción Científica <http://repositorio.unic.co.ao/view/divisions/uneatlantico=5Fproduccion=5Fcientifica.html>
Fundación Universitaria Internacional de Colombia > Investigación > Producción Científica <http://repositorio.unic.co.ao/view/divisions/unincol=5Fproduccion=5Fcientifica.html>
Universidad Internacional Iberoamericana México > Investigación > Producción Científica <http://repositorio.unic.co.ao/view/divisions/uninimx=5Fproduccion=5Fcientifica.html>
Universidad Internacional Iberoamericana Puerto Rico > Investigación > Producción Científica <http://repositorio.unic.co.ao/view/divisions/uninipr=5Fproduccion=5Fcientifica.html>
Universidad Internacional do Cuanza > Investigación > Producción Científica <http://repositorio.unic.co.ao/view/divisions/unic=5Fproduccion=5Fcientifica.html> Abierto Inglés Technology’s expansion has contributed to the rise in popularity of social media platforms. Twitter is one of the leading social media platforms that people use to share their opinions. Such opinions, sometimes, may contain threatening text, deliberately or non-deliberately, which can be disturbing for other users. Consequently, the detection of threatening content on social media is an important task. Contrary to high-resource languages like English, Dutch, and others that have several such approaches, the low-resource Urdu language does not have such a luxury. Therefore, this study presents an intelligent threatening language detection for the Urdu language. A stacking model is proposed that uses an extra tree (ET) classifier and Bayes theorem-based Bernoulli Naive Bayes (BNB) as the based learners while logistic regression (LR) is employed as the meta learner. A performance analysis is carried out by deploying a support vector classifier, ET, LR, BNB, fully connected network, convolutional neural network, long short-term memory, and gated recurrent unit. Experimental results indicate that the stacked model performs better than both machine learning and deep learning models. With 74.01% accuracy, 70.84% precision, 75.65% recall, and 73.99% F1 score, the model outperforms the existing benchmark study. metadata Mehmood, Aneela; Farooq, Muhammad Shoaib; Naseem, Ansar; Rustam, Furqan; Gracia Villar, Mónica; Rodríguez Velasco, Carmen Lilí y Ashraf, Imran mail SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, SIN ESPECIFICAR, monica.gracia@uneatlantico.es, carmen.rodriguez@uneatlantico.es, SIN ESPECIFICAR     <http://repositorio.unic.co.ao/id/eprint/4194/1/applsci-12-10342-v3.pdf>     (2022) Threatening URDU Language Detection from Tweets Using Machine Learning.  Applied Sciences, 12 (20).  p. 10342.  ISSN 2076-3417     
document_url: http://repositorio.unic.co.ao/id/eprint/4194/1/applsci-12-10342-v3.pdf