A Novel Analysis Framework To Discover Spatially Variable Genes
Author(s): Peiying Cai, Mark Robinson, Simone Tiberi
Affiliation(s): University of Zurich
Background: Spatial omics technologies allow measuring gene expression profiles while also retaining information of the spatial tissue. Some computational tools (notably SpatialDE, SPARK-X and MERINGUE) allow identifying spatially variable (SV) genes i.e., genes whose expression profiles vary across tissue. Furthermore, it is often possible to label specific areas of the tissue; for example, in brain studies, neocortex can be divided into layers based on cytoarchitecture , while, in cancer, pathologists’ annotation can separate melanoma, stroma and lymphoid tissues . Alternatively, cells can be grouped via spatially resolved clustering algorithms such as BayesSpace or StLearn: these approaches allow defining “spatial clusters” of cells, i.e., spatially neighbouring cells with similar gene expression profiles. Methodology: We propose an intuitive framework for identifying SV genes based on edgeR, one of the most popular methods for performing differential expression analyses, originally designed for bulk RNA-sequencing data but also widely used on single-cell RNA-sequencing data. Our approach takes advantage of pre-computed spatial clusters, and provides them to edgeR as covariates; SV genes are then identified by testing the significance of spatial clusters, which are taken as proxy for the spatial structure. Clearly, our framework relies on spatial clusters being available and summarising the main spatial features of the data. Nonetheless, we argue that, in the greatest majority of spatial omics datasets, such spatial structures are available or can be easily computed with spatially clustering algorithms. Benchmarking: We performed extensive benchmarks of our approach and various competitors (SpatialDE, SpatialDE2, SPARK, SPARK-X, MERINGUE and trendsceek). In particular, starting from three real spatial omics datasets [1-3] as anchor data, we generated various semi-simulated datasets, with a wide variety of SV patterns. Our approach displays well calibrated false discovery rates and higher true positive rates than all competitors considered. Our framework also has approximately uniform p-values in null semi-simulated datasets (i.e., with uniform spatial expression patterns). Furthermore, when analysing real biological data [1-3], we found that, compared to other SV methods, the genes identified with our approach i) are more coherent across biological and technical replicates, and ii) visually display clearer spatial structures. Furthermore, unlike other SV tools, which only allow modelling individual samples, our approach can jointly handle multiple samples, hence sharing information across biological replicates and targeting SV genes at the “group-level”. This multi-sample approach could be particularly beneficial when there is a large degree of sample-to-sample variability (e.g., in cancer). In addition, our method is the only SV approach which allows testing and identifying the individual spatial clusters (e.g., white matter or cancer tissue) that are affected by spatial expression. Overall, our approach offers several advantages, it is flexible and can input any type of spatial omics data, it displays excellent performance in all our benchmarks, it is more computationally efficient than alternative SV methods (except SPARK-X), when biological replicates are available, it allows jointly sampling multiple samples, it can test the individual spatial locations affected by spatial variability. Availability, we plan to submit our method as a Bioconductor R package in the coming months (around April-June).  Maynard et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nature Neuroscience (2021).  Thrane et al. Spatially resolved transcriptomics enables dissection of genetic heterogeneity in stage iii cutaneous malignant melanoma. Cancer research (2018).  Stickels et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nature biotechnology (2021).