Testing For Associations Between Risk Factors And Mutational Signatures Via Bayesian Dirichlet-Multinomial Hierarchical Model

Testing For Associations Between Risk Factors And Mutational Signatures Via Bayesian Dirichlet-Multinomial Hierarchical Model


Author(s): Ji-Eun Park, Markia Smith, Sarah Van Alsten, Di Wu, Katherine Hoadley, Melissa Troester, Michael I Love

Affiliation(s): University of North Carolina at Chapel Hill



Somatic mutations occur throughout human life due to various mutagenic processes and these processes leave distinct patterns in the genome which are called mutational signatures. Current research on mutational signatures often focuses on detecting de novo mutational signatures. There is yet no method that analyzes the relationship between multiple patient-level covariates and signatures, such as associations of signatures with e.g. patient risk factors or exposures (e.g. smoking), disease subtypes, and/or germline mutation status. We present a Bayesian hierarchical model based on the Dirichlet-Multinomial distribution that determines the association between risk factors with mutational signatures across tumor samples from mutation counts. The proposed model allows any form of risk factor including continuous and categorical variables. In addition, the model takes into account the per-sample uncertainty with respect to the presence of signatures, hence delivering correct inference even with low mutational counts, or with a heterogeneous mixture of samples with varying total counts. We evaluate our method on data simulated from the generating model, as well as on breast cancer data from TCGA with reference to COSMIC signatures. The method is implemented as an R package which we plan to submit to Bioconductor.

On YouTube: