BugSigDB: Accelerating Human Microbiome Research By Systematic Comparison To Published Microbial Signatures
Author(s): Ludwig Geistlinger, Rimsha Azhar, Fatima Zohra, Shaimaa Mohammed Elsafoury, Chloe Anya Mirzayi, J Wokaty, Samuel David Gamboa-Tuz, Heidi E Jones, Sean Davis, Nicola Segata, Curtis Huttenhower, Levi Waldron
Affiliation(s): Harvard Medical School
Background: Variations in the human microbiome are implicated in a wide range of health outcomes, but large gaps remain in their interpretation, reproducibility, and use to develop effective public health interventions. Differential microbial abundance analysis can result in long lists of microbial clades at multiple taxonomic levels. The properties shared by these clades are often not obvious, but could include common environmental exposures, ecological requirements, or physiological characteristics. Although an equivalent of gene set enrichment analysis (GSEA) is a natural way to help interpret such results, all GSEA methods rely on comprehensive databases of signatures, equivalents of which do not yet exist for microbiota. Results: We present BugSigDB, a manually curated database of microbial signatures from published differential abundance studies, providing standardized data on geography, health outcomes, host body sites, and experimental, epidemiological, and statistical methods using controlled vocabulary. To date, BugSigDB provides more than 2,000 signatures from over 500 published studies, allowing systematic assessment of microbiome abundance changes within and across experimental conditions and body sites. Analysis of curated metadata for studies, experiments, and signatures in BugSigDB revealed common practices, but also extensive heterogeneity and unique challenges in reporting results of human microbiome research. Exploration of microbe co-occurrence and signature similarity demonstrated recurrent compositional patterns within signatures of differential abundance driven by taxa most frequently associated with disease and their ecological co-occurrence and mutual exclusivity. Bug set enrichment analysis of BugSigDB signatures on 10 metagenomic datasets investigating fecal microbiomes from colorectal cancer patients (N = 663) revealed significant links to a range of diseases, driven by a common etiologic factor or confounding by a common cause or treatment, and demonstrated applicability of established gene set enrichment methods and new taxonomy-aware enrichment methods for the interpretation of differential microbial abundance. Conclusion: BugSigDB allows researchers to better interpret the results of microbiome studies by comparing observed microbiome changes to previous results that are annotated for evidence quality according to study design and sample size. BugSigDB is a publicly editable semantic MediaWiki customized for reporting the methods and results of published human microbiome studies. Its dynamic contents are exported weekly by GitHub Action to text files, and to Zenodo in sync with the Bioconductor release cycle, proposing a novel, efficient, and FAIR approach to versioning of rapidly-changing data for the Bioconductor ecosystem. Availability: Web-based programmatic access to BugSigDB is available at https://bugsigdb.org. The companion bugsigdbr R/Bioconductor package allows download in a tidy data format from which more application-centric formats can be extracted (https://bioconductor.org/packages/bugsigdbr).