Executive Summary : | Microorganisms play a crucial role in global biogeochemical cycles and human health, with their functionality governed by environmental factors, hosts, and interactions among microbes. Understanding the structure of these communities can reveal biodiversity patterns, stability, and functionality. However, the heterogeneity in community organization patterns and mechanisms of assembly and structuring in different health states remains largely unknown. Microbial co-occurrence-based network analysis can be used as a proxy for microbiome interactions, relying on statistical methods and probabilistic graph models. The major challenges for decoding co-occurrence relationships include compositional nature, sparsity, and finding direct vs indirect associations in microbiome data. Despite several algorithms available, challenges such as preprocessing, evaluation, confounding factors, and interpretation of networks remain. This proposal aims to create a pipeline focusing on robust and accurate network construction, considering important network properties like modularity. The pipeline will enumerate phylogenetic distance information, predictive protein profile, community metabolite potential, and assembly process proportions. Statistical and machine learning approaches will be implemented for comparative evaluation and analysis, identifying global and local microbial community distribution features in terms of composition, interaction, functionality, and phylogeny. This will help understand the microbial community structural and functional diversity shaped by assembly processes and stochastic events in health and disease states. The pipeline will also be applied to reveal human oral microbiota community features in different oral health states, using automated data mining to create a comprehensive oral metagenomic dataset. |