Author(s): Josh Michael Stolz,Ran Tao,Andrew E. Jaffe,Leonardo Collado Torres
Affiliation(s): Lieber Institute of Brain Development
Enormous strides have been made in the last decade of neuropsychiatric, neurodevelopmental, and neurodegenerative research on better understanding the molecular underpinnings of many serious brain disorders. Some of the strongest clues for the etiological underpinnings of many of these disorders, particularly neurodevelopmental and neuropsychiatric disorders, have come from recent large-scale genetic studies which have identified hundreds to thousands of both common and rare genetic variants. Unlike genetic association studies, where risk genotypes are established at birth and unaffected by epiphenomena associated with illness, postmortem tissue gene expression levels represent cumulative effects of living with a psychiatric disorder such as schizophrenia. In previous exploration and analysis of gene expression data from bulk RNA-seq of hundreds of postmortem-derived human brain samples, we identified degradation as a more damaging and often overlooked issue in postmortem gene expression data, particularly comparing patients to controls. We found that factor-based quality surrogate variable analysis removes RNA quality confounding (Jaffe et al, PNAS, 2017): we therefore developed quality surrogate variable analysis (qSVA) which initially utilized the RNA degradation profiles from DLPFC in both polyA+ and RiboZero RNA sequencing libraries (Collado-Torres et al, Neuron, 2019). Our current work expands the scope of qSVA by generating degradation profiles (5 donors across 4 degradation time points: 0, 15, 30, and 60 minutes) from six human brain regions (n = 20 * 6 = 120): dorsolateral prefrontal cortex (DLPFC), hippocampus (HPC), medial prefrontal cortex (mPFC), subgenual anterior cingulate cortex (sACC), caudate, amygdala (AMY). We identified an average of 80,258 transcripts associated (FDR < 5%) with degradation time across the six brain regions. Testing for an interaction between brain region and degradation time identified 45,712 transcripts (FDR < 5%). A comparison of regions showed a unique pattern of expression changes associated with degradation time particularly in the DLPFC, implying that this region may not be representative of the effects of degradation on gene expression in other tissues. Furthermore previous work was done by analyzing expressed regions (Collado-Torres et al, NAR, 2017), and while this is an effective method of analysis, expressed regions are not a common output for many pipelines and are computationally expensive to identify, thus creating a barrier for the use of any qSVA software. In our most recent work expression quantification was performed at the transcript level using Salmon (Patro et al, Nat Methods, 2017). The qsvaR package we have provided will make generating qSVs accessible to the broader bioconductor community. qsvaR is modular with SummarizedExperiment objects to allow for easy integration with packages such as limma. Once the qSVs are generated the limma package can be used to remove the effect of degradation by adding the qSVs to the statistical model design. We are optimistic that by applying this approach to postmortem brain datasets we can increase reproducibility within the field. qsvaR is available at https://github.com/LieberInstitute/qsvaR.