Research

Computer Sciences and Information Technology

Title :

A cross-lingual study of neuron-level explainability of deep natural language processing models and its application in framework building for cross-lingual natural language processing systems

Area of research :

Computer Sciences and Information Technology

Focus area :

Artificial Intelligence, Natural Language Processing

Principal Investigator :

Dr. Ayan Das, Indian Institute Of Technology (Indian School Of Mines) Dhanbad, Jharkhand

Timeline Start Year :

2024

Timeline End Year :

2026

Contact info :

Equipments :

Details

Executive Summary :

Natural language processing (NLP) systems are traditionally trained using annotated data, but this is not always available for most languages due to the high cost and time required. To develop NLP systems for low-resourced languages, cross-lingual approaches are adopted. Transfer learning-based cross-lingual approaches focus on using contextual word representations from large pre-trained language models trained on raw text in different languages. However, the quality of these representations can be degraded if the target language text volume is small or if other languages are syntactically different from the target language. Recent studies have attempted to explain predictions of NLP systems by associating each prediction category with a subset of neurons in the representations. These studies have shown that activations of a subset of neurons are predominantly responsible for encoding knowledge for predicting a particular category. Some NLP systems can be controlled by altering the activation values of a subset of neurons. This project aims to extend this idea to cross-lingual settings, conducting a neuron-level analysis of the cross-lingual performance of deep multilingual models for resource-deficient languages. The goal is to identify subsets of neurons that encode the majority of information corresponding to different prediction classes in different languages for a given NLP task. The information obtained will be used to develop a framework for building cross-lingual systems for under-resource languages, particularly for low-resourced Indian languages.

Total Budget (INR):

30,05,790

Organizations involved