Research

Computer Sciences and Information Technology

Title :

Intelligent speech-to-speech Translation with Lip-syncing for Educational Domain

Area of research :

Computer Sciences and Information Technology

Principal Investigator :

Dr. Partha Pakray, National Institute Of Technology (NIT) silchar, Assam

Timeline Start Year :

2024

Timeline End Year :

2027

Contact info :

Equipments :

Details

Executive Summary :

English is the preferred language in the educational sector globally, but in multilingual countries like India, there is a need to translate English-based lectures or tutorials into local Indian languages to help students understand specific topics. As digital communication becomes more visual, there is a need for systems that can automatically translate a video of an educational expert speaking in English into a target local language with realistic lip synchronization. The motivation for this is due to the increasing audiovisual content in educational information streams, such as YouTube and government institutes' NPTEL videos. Existing systems can only translate audiovisual content at a speech-to-speech level, which has limitations, such as producing unsynchronized lip movements and poor user experience. This project aims to build upon face-to-face translation systems for the educational domain by proposing a pipeline that can take a video of a person speaking in a source language and output a video of the same speaker speaking in a target language, ensuring that the voice style and lip movements justify the target language.

Co-PI:

Prof. sivaji Bandyopadhyay, Jadavpur University, Kolkata, West Bengal-700032

Total Budget (INR):

23,91,297

Organizations involved