General Non-Parametric Tests Of Differential Gene Expression For Single Cell Genomics

General Non-Parametric Tests Of Differential Gene Expression For Single Cell Genomics


Author(s): Alan Aw, Xurui Chen, Dan Erdmann-Pham, Jonathan Fischer, Yun Song

Affiliation(s): University of California, Berkeley



Differential expression analysis is a key component of single cell genomics, enabling the discovery of groups of genes implicating important biological processes such as cell differentiation and aging. Recent work by Schaum et al. (Nature, 2020) investigated changes in regulation owing to senescence, and found that variability indices rather than typical approaches to detecting mean shifts are more efficacious at picking up differentially expressed genes in single cell populations. On the other hand, Li et al. (BMC Genome Biology, 2022) found that many existing methods for identifying differentially expressed genes in population-level RNA-seq have inflated false positive rates, and furthermore suggested the use of the Mann-Whitney test, a non-parametric test, as a reasonable solution. Here, we present open-source software (in both R and Python) that performs flexible, and highly efficient non-parametric one-sample and two-sample tests that generalize the Mann-Whitney test. These tests leverage flexibility in the choice of test statistic to achieve high statistical power across multiple scenarios, even with small sample sizes. These scenarios include shifts in scale that frequently characterize age-related changes in gene regulation, or more general distributional changes. We apply our tests to single cell RNA-seq data provided by Tabula Muris Senis to detect differentially expressed genes (DEGs) over the mice lifecourse across 22 tissues. We detect many tissue-specific transcripts that were previously Mann-Whitney non-significant (at 0.05 FDR control) but are significant for changes in dispersion. Additionally, we also identify a few “persistently DEGs,” which are tissue-specific genes for which significant changes in gene expression are detected between any pair of age groups. Our work contributes to the growing suite of bioinformatics tools developed for single cell genomics, and may also be useful in broader contexts that require the use of flexible non-parametric tests. Theoretical work underpinning our software: https://arxiv.org/abs/2008.06664. Code and analysis vignettes: https://github.com/songlab-cal/MOCHIS/tree/development

On YouTube: