Research

Computer Sciences and Information Technology

Title :

Automatic Parts-of-Speech Tagger Based on BIS Tagset in Assamese

Area of research :

Computer Sciences and Information Technology

Focus area :

Computational Linguistics, Artificial Intelligence

Principal Investigator :

Dr. Nomi Baruah, Dibrugarh University, Assam

Timeline Start Year :

2023

Timeline End Year :

2026

Contact info :

Details

Executive Summary :

Parts-of-speech (POS) tagging is a challenging field in Natural Language Processing (NLP) due to its need for deep insight and knowledge about a specific language, particularly in large volumes of data. Despite the growing number of works on POS tagging in Indian languages like Hindi and Bengali, there is a lack of resources for Assamese, one of India's national languages, with 15.3 million populations worldwide. As NLP research on Assamese language grows, a high-accuracy automatic POS tagger is necessary. A dataset will be developed using BIS tagset for Assamese novels, news articles, and sports, which will be one of the pioneer works in Assamese and Indian languages. The POS tagger will be implemented using RNN-based deep learning methods and a newly designed hybrid method. The outputs and performance of these methods will be critically analyzed for their effectiveness.

Total Budget (INR):

13,69,500

Organizations involved