Real Word Spelling Error Detection and Correction for Urdu Language

Article Subjects > Engineering Europe University of Atlantic > Research > Scientific Production
Fundación Universitaria Internacional de Colombia > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Universidad Internacional do Cuanza > Research > Scientific Production
Abierto Inglés Non-word and real-word errors are generally two types of spelling errors. Non-word errors are misspelled words that are nonexistent in the lexicon while real-word errors are misspelled words that exist in the lexicon but are used out of context in a sentence. Lexicon-based lookup approach is widely used for non-word errors but it is incapable of handling real-word errors as they require contextual information. Contrary to the English language, real-word error detection and correction for low-resourced languages like Urdu is an unexplored area. This paper presents a real-word spelling error detection and correction approach for the Urdu language. We develop an extensive lexicon of 593,738 words and use this lexicon to develop a dataset for real-word errors comprising 125562 sentences and 2,552,735 words. Based on the developed lexicon and dataset, we then develop a contextual spell checker that detects and corrects real-word errors. For the real-word error detection phase, word-gram features are used along with five machine learning classifiers, achieving a precision, recall, and F1-score of 0.84,0.79, and 0.81 respectively. We also test the proposed approach with a 40% error density. For real-word error correction, the Damerau-Levenshtein distance is used along with the n-gram model for further ranking of the suggested candidate words, achieving an accuracy of up to 83.67%. metadata Aziz, Romila and Anwar, Muhammad Waqas and Jamal, Muhammad Hasan and Bajwa, Usama Ijaz and Kuc Castilla, Ángel Gabriel and Uc-Rios, Carlos and Bautista Thompson, Ernesto and Ashraf, Imran mail UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, UNSPECIFIED, carlos.uc@unini.edu.mx, ernesto.bautista@unini.edu.mx, UNSPECIFIED (2023) Real Word Spelling Error Detection and Correction for Urdu Language. IEEE Access. p. 1. ISSN 2169-3536

[img] Text
Real_Word_Spelling_Error_Detection_and_Correction_for_Urdu_Language.pdf
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB)

Abstract

Non-word and real-word errors are generally two types of spelling errors. Non-word errors are misspelled words that are nonexistent in the lexicon while real-word errors are misspelled words that exist in the lexicon but are used out of context in a sentence. Lexicon-based lookup approach is widely used for non-word errors but it is incapable of handling real-word errors as they require contextual information. Contrary to the English language, real-word error detection and correction for low-resourced languages like Urdu is an unexplored area. This paper presents a real-word spelling error detection and correction approach for the Urdu language. We develop an extensive lexicon of 593,738 words and use this lexicon to develop a dataset for real-word errors comprising 125562 sentences and 2,552,735 words. Based on the developed lexicon and dataset, we then develop a contextual spell checker that detects and corrects real-word errors. For the real-word error detection phase, word-gram features are used along with five machine learning classifiers, achieving a precision, recall, and F1-score of 0.84,0.79, and 0.81 respectively. We also test the proposed approach with a 40% error density. For real-word error correction, the Damerau-Levenshtein distance is used along with the n-gram model for further ranking of the suggested candidate words, achieving an accuracy of up to 83.67%.

Item Type: Article
Uncontrolled Keywords: Real-word errors, spelling correction, spelling detection, spell checker
Subjects: Subjects > Engineering
Divisions: Europe University of Atlantic > Research > Scientific Production
Fundación Universitaria Internacional de Colombia > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Ibero-american International University > Research > Scientific Production
Universidad Internacional do Cuanza > Research > Scientific Production
Date Deposited: 14 Sep 2023 23:30
Last Modified: 14 Sep 2023 23:30
URI: https://repositorio.unic.co.ao/id/eprint/8800

Actions (login required)

View Item View Item

<a href="/8725/1/diagnostics-13-02871.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Voxel Extraction and Multiclass Classification of Identified Brain Regions across Various Stages of Alzheimer’s Disease Using Machine Learning Approaches

This study sought to investigate how different brain regions are affected by Alzheimer’s disease (AD) at various phases of the disease, using independent component analysis (ICA). The study examines six regions in the mild cognitive impairment (MCI) stage, four in the early stage of Alzheimer’s disease (AD), six in the moderate stage, and six in the severe stage. The precuneus, cuneus, middle frontal gyri, calcarine cortex, superior medial frontal gyri, and superior frontal gyri were the areas impacted at all phases. A general linear model (GLM) is used to extract the voxels of the previously mentioned regions. The resting fMRI data for 18 AD patients who had advanced from MCI to stage 3 of the disease were obtained from the ADNI public source database. The subjects include eight women and ten men. The voxel dataset is used to train and test ten machine learning algorithms to categorize the MCI, mild, moderate, and severe stages of Alzheimer’s disease. The accuracy, recall, precision, and F1 score were used as conventional scoring measures to evaluate the classification outcomes. AdaBoost fared better than the other algorithms and obtained a phenomenal accuracy of 98.61%, precision of 99.00%, and recall and F1 scores of 98.00% each.

Producción Científica

Samra Shahzadi mail , Naveed Anwer Butt mail , Muhammad Usman Sana mail , Iñaki Elío Pascual mail inaki.elio@uneatlantico.es, Mercedes Briones Urbano mail mercedes.briones@uneatlantico.es, Isabel de la Torre Díez mail , Imran Ashraf mail ,

Shahzadi

<a class="ep_document_link" href="/8726/1/sensors-23-07710-v2.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Adaptive Filtering: Issues, Challenges, and Best-Fit Solutions Using Particle Swarm Optimization Variants

Adaptive equalization is crucial in mitigating distortions and compensating for frequency response variations in communication systems. It aims to enhance signal quality by adjusting the characteristics of the received signal. Particle swarm optimization (PSO) algorithms have shown promise in optimizing the tap weights of the equalizer. However, there is a need to enhance the optimization capabilities of PSO further to improve the equalization performance. This paper provides a comprehensive study of the issues and challenges of adaptive filtering by comparing different variants of PSO and analyzing the performance by combining PSO with other optimization algorithms to achieve better convergence, accuracy, and adaptability. Traditional PSO algorithms often suffer from high computational complexity and slow convergence rates, limiting their effectiveness in solving complex optimization problems. To address these limitations, this paper proposes a set of techniques aimed at reducing the complexity and accelerating the convergence of PSO.

Producción Científica

Arooj Khan mail , Imran Shafi mail , Sajid Gul Khawaja mail , Isabel de la Torre Díez mail , Miguel Ángel López Flores mail miguelangel.lopez@uneatlantico.es, Juan Castanedo Galán mail juan.castanedo@uneatlantico.es, Imran Ashraf mail ,

Khan

<a class="ep_document_link" href="/8760/1/diagnostics-13-02881.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Empowering Lower Limb Disorder Identification through PoseNet and Artificial Intelligence

A novel approach is presented in this study for the classification of lower limb disorders, with a specific emphasis on the knee, hip, and ankle. The research employs gait analysis and the extraction of PoseNet features from video data in order to effectively identify and categorize these disorders. The PoseNet algorithm facilitates the extraction of key body joint movements and positions from videos in a non-invasive and user-friendly manner, thereby offering a comprehensive representation of lower limb movements. The features that are extracted are subsequently standardized and employed as inputs for a range of machine learning algorithms, such as Random Forest, Extra Tree Classifier, Multilayer Perceptron, Artificial Neural Networks, and Convolutional Neural Networks. The models undergo training and testing processes using a dataset consisting of 174 real patients and normal individuals collected at the Tehsil Headquarter Hospital Sadiq Abad. The evaluation of their performance is conducted through the utilization of K-fold cross-validation. The findings exhibit a notable level of accuracy and precision in the classification of various lower limb disorders. Notably, the Artificial Neural Networks model achieves the highest accuracy rate of 98.84%. The proposed methodology exhibits potential in enhancing the diagnosis and treatment planning of lower limb disorders. It presents a non-invasive and efficient method of analyzing gait patterns and identifying particular conditions.

Producción Científica

Hafeez Ur Rehman Siddiqui mail , Adil Ali Saleem mail , Muhammad Amjad Raza mail , Santos Gracia Villar mail santos.gracia@uneatlantico.es, Luis Dzul Lopez mail luis.dzul@unini.edu.mx, Isabel de la Torre Diez mail , Furqan Rustam mail , Sandra Dudley mail ,

Siddiqui

<a class="ep_document_link" href="/8800/1/Real_Word_Spelling_Error_Detection_and_Correction_for_Urdu_Language.pdf"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Real Word Spelling Error Detection and Correction for Urdu Language

Non-word and real-word errors are generally two types of spelling errors. Non-word errors are misspelled words that are nonexistent in the lexicon while real-word errors are misspelled words that exist in the lexicon but are used out of context in a sentence. Lexicon-based lookup approach is widely used for non-word errors but it is incapable of handling real-word errors as they require contextual information. Contrary to the English language, real-word error detection and correction for low-resourced languages like Urdu is an unexplored area. This paper presents a real-word spelling error detection and correction approach for the Urdu language. We develop an extensive lexicon of 593,738 words and use this lexicon to develop a dataset for real-word errors comprising 125562 sentences and 2,552,735 words. Based on the developed lexicon and dataset, we then develop a contextual spell checker that detects and corrects real-word errors. For the real-word error detection phase, word-gram features are used along with five machine learning classifiers, achieving a precision, recall, and F1-score of 0.84,0.79, and 0.81 respectively. We also test the proposed approach with a 40% error density. For real-word error correction, the Damerau-Levenshtein distance is used along with the n-gram model for further ranking of the suggested candidate words, achieving an accuracy of up to 83.67%.

Producción Científica

Romila Aziz mail , Muhammad Waqas Anwar mail , Muhammad Hasan Jamal mail , Usama Ijaz Bajwa mail , Ángel Gabriel Kuc Castilla mail , Carlos Uc-Rios mail carlos.uc@unini.edu.mx, Ernesto Bautista Thompson mail ernesto.bautista@unini.edu.mx, Imran Ashraf mail ,

Aziz

<a href="/8801/1/Software_Cost_and_Effort_Estimation_Current_Approaches_and_Future_Trends.pdf" class="ep_document_link"><img class="ep_doc_icon" alt="[img]" src="/style/images/fileicons/text.png" border="0"/></a>

en

open

Software Cost and Effort Estimation: Current Approaches and Future Trends

Software cost and effort estimation is one of the most significant tasks in the area of software engineering. Research conducted in this field has been evolving with new techniques that necessitate periodic comparative analyses. Software project success largely depends on accurate software cost estimation as it gives an idea of the challenges and risks involved in the development. The great diversity of ML and Non-ML techniques has generated a comparison and progressed into the integration of these techniques. Based on varying advantages it has become imperative to work out preferred estimation techniques to improve the project development process. This study aims to present a systematic literature review (SLR) to investigate the trends of the articles published in the recent one and a half decades and to propose a way forward. This systematic literature review has proposed a three-stage approach to plan (Tollgate approach), conduct (Likert type scale), and report the results from five renowned digital libraries. For the selected 52 articles, artificial neural network model (ANN) and constructive cost model (COCOMO) based approaches have been the favored techniques. The mean magnitude of relative error (MMRE) has been the preferred accuracy metric, software engineering, and project management are the most relevant fields, and the promise repository has been identified as the widely accessed database. This review is likely to be of value for the development, cost, and effort estimations.

Producción Científica

Chaudhary Hamza Rashid mail , Imran Shafi mail , Jamil Ahmad mail , Ernesto Bautista Thompson mail ernesto.bautista@unini.edu.mx, Manuel Masías Vergara mail manuel.masias@uneatlantico.es, Isabel De La Torre Diez mail , Imran Ashraf mail ,

Rashid