Dreamlet: Scalable Differential Expression Analysis Of Single Cell Transcriptomics Datasets With Complex Study Designs
Author(s): Gabriel E Hoffman, Donghoon Lee, Panos Roussos,
Affiliation(s): Icahn School of Medicine at Mount Sinai
Recent advances in single cell/nucleus transcriptomic technology has enabled collection of population-level data sets to study cell type specific gene expression differences associated with disease state, stimulus, and genetic regulation. The scale of these data, complex study designs, and low read count per cell mean that characterizing cell type specific molecular mechanisms requires a user-friendly, purpose-built analytical framework. We have developed the dreamlet package that applies a pseudobulk approach and fits a regression model for each gene and cell cluster to test differential expression across individuals associated with a trait of interest. Use of precision-weighted linear mixed models enables accounting for repeated measures study designs, high dimensional batch effects, and varying sequencing depth or observed cells per biosample. Dreamlet further enables analysis of massive-scale of single cell/nucleus transcriptome datasets by addressing both CPU and memory usage limitations. Dreamlet performs preprocessing and statistical analysis in parallel on multicore machines, and can distribute work across multiple nodes on a compute cluster. Dreamlet also uses the H5AD format for on-disk data storage to enable data processing in smaller chunks to dramatically reduce memory usage. The dreamlet workflow easily integrates into the Bioconductor ecosystem, and uses the SingleCellExperiment class to facilitate compatibility with other analyses. Beyond differential expression testing, dreamlet provides seamless integration of downstream analysis including quantifying sources of expression variation, gene set analysis using the full spectrum of gene-level t-statistics, testing differences in cell type composition and visualizing results. We demonstrate performance of dreamlet on simulated data and single nucleus RNA-seq data from hundreds of post mortem brains from donors with and without Alzheimer's disease.