Research

Computer Sciences and Information Technology

Title :

Information Access from Document Images of Indian languages

Area of research :

Computer Sciences and Information Technology

Focus area :

Multimodal, Multilingual and Cross-lingual Interfaces

Principal Investigator :

Prof. Prabir Kumar Biswas, Professor and Head, Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology (IIT), Kharagpur

Contact info :

Details

Executive Summary :

Development content aware image processing algorithms for robust and efficient recognition and retrieval from Indian language document images is proposed. Our image processing algorithms aim at improving the quality of document images by removing the noise and low resolution artifacts by adopting content aware shape-based morphological filters. A set of recognizers will be built using state of the art machine learning techniques such as deep learning for handwritten, typewritten and low resolution document images where the existing technologies are insufficient. For hard and noisy handwritten documents, we propose holistic keyword spotting techniques to reduce search space and complement the recognition based approaches. We will also build and demonstrate information access and retrieval schemes over a joint space of image features and noisy text, so as to enable a set of immediate practical applications. The methods will be validated on two different focussed collections during the project.

Co-PI:

Prof. Jayanta Mukhopadhyay, Professor, Department of Computer Science and Engineering, Indian Institute of Technology (IIT), Kharagpur, Prof. Santanu Chaudhury, CEERI Pilani, Prof. Bhabotosh Chanda, Professor, Electronics and Communication Sciences Unit, Indian Statistical Institute (ISI), Kolkata, Prof. Shamik Sural, Dept. of CSE, IIT Kharagpur, Dr. C. V. Jawahar, IIIT Hyderabad

Total Budget (INR):

4,00,00,000

Organizations involved