Research

Engineering Sciences

Title :	Development of spoken Language Corpora for Under Resourced Languages
Area of research :	Engineering Sciences
Principal Investigator :	Dr. Tanmay Bhowmik, Pandit Deendayal Energy University, Gandhinagar, Gujarat
Timeline Start Year :	2024
Timeline End Year :	2027
Contact info :	tanmaybhowmik@gmail.com

Details

Executive Summary :

Under-resourced languages are those with limited resources, such as speech data, language models, or text corpora, which are often spoken by smaller communities and are less well-studied than more commonly spoken languages. These languages are often not well-studied and require the creation of speech corpora based on spoken language, which contains prosodic words. These corpora can improve the uniformity and robustness of current AsR systems. A spoken language corpus is a collection of recorded speech that is transcribed and annotated with linguistic information, such as phonetic and prosodic features, which can be used for developing and evaluating speech recognition and language processing systems. These corpora are necessary for training speech recognition systems, developing language models, linguistic research, and preserving cultural heritage. speech recognition systems typically use machine learning algorithms, which require large amounts of annotated speech data. A spoken language corpus can provide a foundation for training these systems, improving their accuracy and performance. Linguistic research can also be conducted on spoken language corpora, focusing on phonetics, prosody, and syntax to deepen our understanding of the language and its structure. In conclusion, spoken language corpora are crucial resources for developing and evaluating speech recognition and language processing systems, as well as linguistic research and cultural heritage preservation.

Total Budget (INR):

18,30,000

Organizations involved

Implementing Agency :	Pandit Deendayal Energy University, Gandhinagar, Gujarat
Funding Agency :	Anusandhan National Rsearch Foundation (ANRF)/science and Engineering Research Board (sERB)
Source :	Anusandhan National Research Foundation/science and Engineering Research Board (sERB), DsT 2023-24