Research

Mathematical Sciences

Title :

Bayesian Variable Selection in High-dimensional Settings with Grouped Covariates

Area of research :

Mathematical Sciences

Focus area :

Bayesian Inference, Variable Selection

Principal Investigator :

Dr. Minerva Mukhopadhyay, Indian Institute Of Technology Kanpur (IITK), Uttar Pradesh

Timeline Start Year :

2024

Timeline End Year :

2027

Contact info :

Details

Executive Summary :

Consider the normal linear regression setup when the number of covariates, p, is much larger than the sample size, n. We consider the situation where the covariates form highly correlated groups. Unlike group LASSO and other related methods, we do not assume that the response is related to an entire group, rather sparsity assumption persists within group as well. We extend the popular shrinkage prior method, the g-prior setup, to this framework. The variable selection consistency property of the proposed method will be investigated under fairly general conditions, assuming the covariates to be random and allowing the true model to grow with (n,p). As the number of models increase exponentially with p, and multicollinearity is unavoidable for high-dimensional data, the implementation of any variable selection method is quite challenging. Towards that, two comparable procedures can be proposed: First, a two-stage procedure can be proposed where one may consider group-level screening instead of marginal screening as a per-processing step. Secondly, the RJMCMC algorithm usually implemented for stochastic search variable selection (SSVS) can be updated by feeding the group information and group importance to it, so that the lengthy search for a new covariate can be shortened substantially. The complexity of such an algorithm should depend only on the maximum group size, and not on p. We will also investigate the mixing rate of the proposed algorithm theoretically, based on the group size and number of groups containing the active covariates. Apart from theoretical investigation, the performance of the two proposals will be investigated using simulated and real data sets.

Total Budget (INR):

6,60,000

Organizations involved