Executive Summary : | In silico protein design methods aim to predict a compatible amino acid sequence for a target structure or function. This challenging task has significant applications in various fields such as therapeutics, catalysis, sensors, and molecular machines. Machine learning techniques have revolutionized this field, with significant progress made using Monte Carlo simulation, molecular dynamics simulation, and mean field theory. However, the field of protein design still faces challenges, with most success stories being specific to the target structure. The high failure rate in protein design methods may be due to the use of only positive design approaches, which stabilize the target structure while searching in the sequence space. Negative design approach, destabilizing competing non-native structures, is crucial for success, as stability in the target structure does not necessarily rule out better stability in another structure. In this project, a generalized method based on machine learning and Monte Carlo simulation techniques will be developed to design sequences using explicit negative design approach. A k-fold cross-validated machine learning model will be developed using backbone structural features extracted from a high-quality structural dataset to design sequences. Negative design will be incorporated using Monte Carlo simulation starting from the sequence designed by the earlier model. The final designed sequence will be validated using various methods, including molecular dynamics simulations. Successful implementation of this project will result in a generalizable method to design protein sequences with higher success rates, providing deeper understanding of protein folding and misfolding physics. |