Research

Engineering Sciences

Title :

Pushing the boundaries of cross-modal retrieval

Area of research :

Engineering Sciences

Focus area :

Artificial Intelligence

Principal Investigator :

Prof. Soma Biswas, Indian Institute Of Science, Bangalore, Karnataka

Timeline Start Year :

2019

Timeline End Year :

2022

Contact info :

Details

Executive Summary :

Due to availability of large amounts of multimedia data, cross-modal retrieval is gaining immense importance. It has many applications, starting from text-image matching in search engines, sketch-image face matching in forensics, etc. Though several algorithms have been proposed, real-world scenarios with huge amounts of data poses several challenges and in this proposal, we aim to address a few of them. First, majority of the current algorithms are designed for data with single-labels which are not exactly applicable for multi-label data. For example, representing an image which consists of several concepts using a single feature vector may result in loosing the fine-grained information about the image. Since most of the data has to be described using multiple tags, we propose to develop cross-modal hashing algorithms for multi-label data. We will explore image-to-tag generation approaches since they can implicitly relate the tags with the corresponding image regions for this problem. We also plan to explore multi-instance learning techniques (which deals with bags of instances/tags) for this task. Also, the current cross-modal retrieval evaluation criteria considers only the exact tag matches, which we feel is restrictive. We want to revisit the evaluation criteria so that the semantic relevance of the retrieved data is also considered. In real-world, new categories are continuously being discovered, but cross-modal techniques are usually evaluated on data which comes from previously seen classes. Their performance degrades considerably if they are tested on unseen classes. We want to develop algorithms, which generalize to unseen categories by utilizing attribute information, which links the seen and unseen classes. We will incorporate this information in the form of a triplet or quadruplet loss in the cross-modal hashing approaches to generalize them to unseen categories. In real-world, the amount of data being captured is ever increasing, and most of the current algorithms needs to be retrained to handle the increasing data or additional tag information. We want to develop algorithms, which can handle increasing data in an online manner. We will utilize the training data in mini-batches, while keeping one fixed subset to maintain the semantic relations between all the mini-batches. If the number of tags increase, we want to develop algorithms which can increase the number of hash codes for better representation without requiring to relearn everything. We propose to compute the additional bits in such a way that they encode the semantic relations not captured using the initial bits. Our goal is to develop efficient algorithms for addressing these difficult and relatively unexplored problems, which will be useful for different kinds of applications. We will publish the results in top-tier conferences and journals and also make the codes publicly available which will help researchers in this area.

Co-PI:

Dr. Raj Sankar Cheriyedath Csir- National Institute For Interdisciplinary Science And Technology (Niist),Industrial Estate Post Office Pappanamcode,Kerala,Thiruvananthapuram-695019

Total Budget (INR):

18,59,411

Publications :

 
1

Organizations involved