Short talk

Whats In A GEM? That Which We Call A Single Cell, By Another Method Would Not Smell As Sweet.

Whats In A GEM? That Which We Call A Single Cell, By Another Method Would Not Smell As Sweet. Author(s): Wes Wilson Affiliation(s): University of Pennsylvania Twitter: @WesleyWilson As single cell transcriptome technology advances and aims to capture more and more cells, the question becomes which GEMs have quality cells in them and which algorithms and approaches are best for particular cells of interest. In this short talk I will go over Bioconductor and third party approaches to answer this question so that on your next analysis you won't be leaving data on the table.

Continue reading

wcGeneSummary: Text Mining And Annotating Gene Cluster

wcGeneSummary: Text Mining And Annotating Gene Cluster Author(s): Noriaki Sato Affiliation(s): Kyoto University The functional annotation of the gene lists identified by gene clustering or differential expression analysis is one of the central focuses of bioinformatics analysis. The enrichment analysis can address the problem using the curated biological pathway databases, however, using text mining approaches have a potential to annotate the gene list in better resolution or reveal the previously unknown mechanism not listed in the databases.

Continue reading

Using Bioconductor On GPU Enabled Cloud Vms

Using Bioconductor On GPU Enabled Cloud Vms Author(s): Nitesh Turaga,Vincent James Carey Affiliation(s): Dana Farber Cancer institute GPU-enabled virtual machines provide the ability to speed up computation for deep learning libraries such as Keras and TensorFlow. These libraries are usually written in python, but with the help of R + python interoperability packages like “reticulate” (CRAN) and “basilisk” (Bioconductor), developers have been able to leverage these deep learning capabilities in Bioconductor.

Continue reading

Testing For Associations Between Risk Factors And Mutational Signatures Via Bayesian Dirichlet-Multinomial Hierarchical Model

Testing For Associations Between Risk Factors And Mutational Signatures Via Bayesian Dirichlet-Multinomial Hierarchical Model Author(s): Ji-Eun Park, Markia Smith, Sarah Van Alsten, Di Wu, Katherine Hoadley, Melissa Troester, Michael I Love Affiliation(s): University of North Carolina at Chapel Hill Somatic mutations occur throughout human life due to various mutagenic processes and these processes leave distinct patterns in the genome which are called mutational signatures. Current research on mutational signatures often focuses on detecting de novo mutational signatures.

Continue reading

Sparsearray Objects: A New Container For Efficient In-Memory Representation Of Multidimensional Sparse Arrays

Sparsearray Objects: A New Container For Efficient In-Memory Representation Of Multidimensional Sparse Arrays Author(s): Hervé Pagès, Affiliation(s): Fred Hutchinson Cancer Research Center SparseArray objects use an innovative internal representation, called Sparse Vector Tree or SVT layout, to store the sparse data in memory. This layout allows compact representation as well as efficient access to the data. SparseArray objects support the traditional array API from base R, that is, the end user can operate on them via standard array operations like [ (subsetting), [<- (subassignment), dim(), dimnames(), t(), etc.

Continue reading

Robust Differential Composition And Variability Analysis For Multisample Cell Omics

Robust Differential Composition And Variability Analysis For Multisample Cell Omics Author(s): Stefano Mangiola, Alex Schulze, Marie Trussart, Enrique Zozaya, Mengyao Ma, Zijie Guo, Alan Rubin, Terry Speed, Heejung Shim, Anthony Papenfuss Affiliation(s): WEHI Twitter: @steman_research Cell omics such as single-cell genomics, proteomics and microbiomics allow the characterisation of tissue and microbial community composition, which can be compared between conditions to identify biological drivers. This strategy has been critical to unveiling markers of disease progression such as cancer and pathogen infection.

Continue reading

Reusedata: An Open-Source, Open-Development Tool For Reusable And Reproducible Genomic Data Management

Reusedata: An Open-Source, Open-Development Tool For Reusable And Reproducible Genomic Data Management Author(s): Qian Liu Affiliation(s): Roswell Park Comprehensive Cancer Center Twitter: @QianLiu28878838 The fast-growing volume and complexity of genomic data resources brings exceptional opportunities to the research community, yet poses significant challenges to properly manage the data around access, curation, annotation and storage. The individual data management can lead to substantial inefficiencies for repeated work and wasted computing resources, especially for those highly reused data curation steps such as the indexed reference genome.

Continue reading

qsvaR

qsvaR Author(s): Josh Michael Stolz,Ran Tao,Andrew E. Jaffe,Leonardo Collado Torres Affiliation(s): Lieber Institute of Brain Development Enormous strides have been made in the last decade of neuropsychiatric, neurodevelopmental, and neurodegenerative research on better understanding the molecular underpinnings of many serious brain disorders. Some of the strongest clues for the etiological underpinnings of many of these disorders, particularly neurodevelopmental and neuropsychiatric disorders, have come from recent large-scale genetic studies which have identified hundreds to thousands of both common and rare genetic variants.

Continue reading

Predictive Modelling Of Dataset-Specific Single-Cell RNA-Seq Pipeline Performance

Predictive Modelling Of Dataset-Specific Single-Cell RNA-Seq Pipeline Performance Author(s): Cindy Fang, Alina Selega, Kieran R. Campbell Affiliation(s): University of Toronto The advent of single-cell RNA-sequencing (scRNA-seq) has driven a plethora of computational methods development for all analysis stages, including filtering, normalisation, and clustering. With many choices for each step in the analysis pipeline available to practitioners, selecting the optimal workflow can be a difficult task. Considering an unrealistically simplistic example with only 3 analysis steps (e.

Continue reading

nnSVG: Scalable Identification Of Spatially Variable Genes Using Nearest-Neighbor Gaussian Processes

nnSVG: Scalable Identification Of Spatially Variable Genes Using Nearest-Neighbor Gaussian Processes Author(s): Lukas M Weber, Stephanie Hicks Affiliation(s): Johns Hopkins Bloomberg School of Public Health Twitter: @lmwebr Feature selection to identify spatially variable genes (SVGs) is a key step during analyses of spatially resolved transcriptomics data. We introduce 'nnSVG', a scalable new method to identify SVGs based on nearest-neighbor Gaussian processes. Our method can identify SVGs with flexible spatial ranges in expression patterns per gene, can identify SVGs within spatial domains, and scales linearly with the number of spatial locations.

Continue reading

Netzoor: A Software Infrastructure For The Inference And Analysis Of Gene Regulatory Networks

Netzoor: A Software Infrastructure For The Inference And Analysis Of Gene Regulatory Networks Author(s): Marouen Ben Guebila, Tian Wang, John Quackenbush Affiliation(s): Harvard T.H. Chan School of Public Health Twitter: @marouenbg The reconstruction of gene regulatory networks requires the development of software tools to integrate data from various genomic modalities. Our research group has developed several network methods to infer biological networks and compare them by conducting differential analyses in a case versus control setting.

Continue reading

Large-Scale Analysis Of The Molecular Anatomy Of The Dorsolateral Prefrontal Cortex (DLPFC) Through The Use Of Unsupervised Methods With Spatial RNA-seq

Large-Scale Analysis Of The Molecular Anatomy Of The Dorsolateral Prefrontal Cortex (DLPFC) Through The Use Of Unsupervised Methods With Spatial RNA-seq Author(s): Abby Spangler, Nicholas J. Eagles, Kelsey Montgomery, Madhavi Tippani, Heena R. Divecha, Stephanie Hicks, Keri Martinowich, Kristen R. Maynard, Leonardo Collado Torres Affiliation(s): Lieber Institute for Brain Development The molecular anatomy of the cortex is well known in the context of histologic layers. However, emerging spatially-resolved transcriptomic approaches have enabled unbiased classification of novel spatial domains based on molecular signatures.

Continue reading

igvR: Interactive Genome Exploration From R

igvR: Interactive Genome Exploration From R Author(s): Paul T Shannon Affiliation(s): Institute for Systems Biology Genomic and epigenomic assays provide large quantities of many kinds of data, many of which are annotations upon the genome: alignments, variants, methylation, copy number, transcription factor binding sites to name a few. An interactive visual interface to these data is an indispensable element of exploratory data analysis. The R/Bioconductor package igvR brings all of the capabilities of the browser-based Javascript library igv.

Continue reading

General Non-Parametric Tests Of Differential Gene Expression For Single Cell Genomics

General Non-Parametric Tests Of Differential Gene Expression For Single Cell Genomics Author(s): Alan Aw, Xurui Chen, Dan Erdmann-Pham, Jonathan Fischer, Yun Song Affiliation(s): University of California, Berkeley Differential expression analysis is a key component of single cell genomics, enabling the discovery of groups of genes implicating important biological processes such as cell differentiation and aging. Recent work by Schaum et al. (Nature, 2020) investigated changes in regulation owing to senescence, and found that variability indices rather than typical approaches to detecting mean shifts are more efficacious at picking up differentially expressed genes in single cell populations.

Continue reading

Evolution Of The DECIPHER Package For Comparative Genomics

Evolution Of The DECIPHER Package For Comparative Genomics Author(s): Erik Scott Wright Affiliation(s): University of Pittsburgh Twitter: @digitalwright The DECIPHER package has been part of Bioconductor for about 11 years and has continued to grow in scope and utility. Although the R package was originally developed for designing oligonucleotide probes and primers, it now includes cutting-edge functionality for multiple sequence alignment, sequence classification, phylogenetics, and many other advanced applications of bioinformatics.

Continue reading

Dreamlet: Scalable Differential Expression Analysis Of Single Cell Transcriptomics Datasets With Complex Study Designs

Dreamlet: Scalable Differential Expression Analysis Of Single Cell Transcriptomics Datasets With Complex Study Designs Author(s): Gabriel E Hoffman, Donghoon Lee, Panos Roussos, Affiliation(s): Icahn School of Medicine at Mount Sinai Recent advances in single cell/nucleus transcriptomic technology has enabled collection of population-level data sets to study cell type specific gene expression differences associated with disease state, stimulus, and genetic regulation. The scale of these data, complex study designs, and low read count per cell mean that characterizing cell type specific molecular mechanisms requires a user-friendly, purpose-built analytical framework.

Continue reading

Distance Metric Learning On The L1000 Connectivity Map

Distance Metric Learning On The L1000 Connectivity Map Author(s): Ian Smith,Benjamin Haibe-Kains Affiliation(s): University Health Network The Next Generation Connectivity Map (L1000) is a massive, high-throughput dataset measuring transcriptional changes in cancer cell lines from chemical and genetic perturbation. Applications of L1000 require computing similarities among signatures to identify similar and dissimilar perturbations. We introduce a method from the field of metric learning to learn a class of similarity functions from the data that maximizes discrimination of replicate signatures.

Continue reading

Detecting SARS‑Cov‑2 Lineages And Mutational Load In Municipal Wastewater And A Use‑Case In The Metropolitan Area Of Thessaloniki, Greece

Detecting SARS‑Cov‑2 Lineages And Mutational Load In Municipal Wastewater And A Use‑Case In The Metropolitan Area Of Thessaloniki, Greece Author(s): Nikolas Pechlivanis, Maria Tsagiopoulou, Maria Christina Maniou, Anastasis Togkousidis, Stamatia Laidou, Elisavet Vlachonikola, Evangelia Mouchtaropoulou, Taxiarchis Chassalevris, Serafeim Chaintoutis, Chrysostomos Dovas, Maria Petala, Margaritis Kostoglou, Thodoris Karapantsios, Aspasia Orfanou, Styliani Christina Fragkouli, Sofoklis Keisaris, Anastasia Chatzidimitriou, Agis Papadopoulos, Nikolaos Papaioannou, Anagnostis Argiriou, Fotis Psomopoulos Affiliation(s): Centre of Research and Technology Hellas, Greece Twitter: @npechl Nearly two years after the first report of SARS-CoV-2 in Wuhan, China, the virus has caused an unprecedented global crisis.

Continue reading

Detecting And Quantifying Antibody Reactivity In Phip-Seq Data With BEER

Detecting And Quantifying Antibody Reactivity In Phip-Seq Data With BEER Author(s): Athena Chen, Kai Kammers, H Benjamin Larman, Robert B Scharpf, Ingo Ruczinski Affiliation(s): Johns Hopkins University Twitter: @athena_chen Because of their high abundance, easy accessibility in peripheral blood, and relative stability ex vivo, antibodies serve as excellent records of environmental exposures and immune responses. While several multiplexed methods have been developed to assess antibody binding specificities, the recently developed Phage Immuno-Precipitation Sequencing (PhIP-Seq) is the most efficient technique available for assessing antibody binding to hundreds of thousands of peptides at cohort scale.

Continue reading

Decoupler: Ensemble Of Computational Methods To Infer Biological Activities From Omics Data

Decoupler: Ensemble Of Computational Methods To Infer Biological Activities From Omics Data Author(s): Pau Badia i Mompel, Jesús Vélez Santiago, Jana Muriel Braunger, Celina Geiss, Daniel Dimitrov, Sophia Müller-Dott, Petr Taus, Aurelien Dugourd, Christian H. Holland, Ricardo Omar Ramirez Flores, Julio Saez-Rodriguez Affiliation(s): Institute for Computational Biomedicine Twitter: @PauBadiaM Many methods allow us to extract biological activities from omics data using information from prior knowledge resources, reducing the dimensionality for increased statistical power and better interpretability.

Continue reading

Data-Driven Identification Of Total RNA Expression Genes (Tregs) For Estimation Of RNA Abundance In Heterogeneous Cell Types

Data-Driven Identification Of Total RNA Expression Genes (Tregs) For Estimation Of RNA Abundance In Heterogeneous Cell Types Author(s): Louise A. Huuki-Myers, Kelsey D. Montgomery, Sang Ho Kwon, Stephanie C. Page, Stephanie Hicks, Kristen R. Maynard, Leonardo Collado Torres Affiliation(s): Lieber Institute for Brain Development Twitter: @lahuuki Next generation sequencing technologies have facilitated data-driven identification of gene sets with different features including housekeeping genes, cell-type specific expression, or spatially variable expression.

Continue reading

Chromophobe: A Framework For Comparative And Contrastive Evaluation Of Chromatin State Models

Chromophobe: A Framework For Comparative And Contrastive Evaluation Of Chromatin State Models Author(s): Lauren Marie Harmon, Connie Krawczyk, Tim Triche Affiliation(s): Van Andel Institute Twitter: @Lauren_M_Harmon The ChromHMM model introduced by Ernst and colleagues has become a standard representation for chromatin state in the decade since its original publication. Recent modifications include the introduction of a stacked or per-cell-type model (Vu & Ernst, 2022) along with a repurposing of the original model to capture extremely sparse single-cell data via posterior state probabilities (Zhang & Srivastava et al.

Continue reading

Characterizing Cellular Heterogeneity In Chromatin State With scCUT&Tag-pro And scChromHMM

Characterizing Cellular Heterogeneity In Chromatin State With scCUT&Tag-pro And scChromHMM Author(s): Avi Srivastava Affiliation(s): New York Genome Center Twitter: @k3yavi Technologies that profile chromatin modifications at single-cell resolution offer enormous promise for functional genomic characterization, but the sparsity of the measurements and integrating multiple binding maps represent substantial challenges. Here we introduce single-cell (sc)CUT&Tag-pro, a multimodal assay for profiling protein–DNA interactions coupled with the abundance of surface proteins in single cells.

Continue reading

BugSigDB: Accelerating Human Microbiome Research By Systematic Comparison To Published Microbial Signatures

BugSigDB: Accelerating Human Microbiome Research By Systematic Comparison To Published Microbial Signatures Author(s): Ludwig Geistlinger, Rimsha Azhar, Fatima Zohra, Shaimaa Mohammed Elsafoury, Chloe Anya Mirzayi, J Wokaty, Samuel David Gamboa-Tuz, Heidi E Jones, Sean Davis, Nicola Segata, Curtis Huttenhower, Levi Waldron Affiliation(s): Harvard Medical School Background: Variations in the human microbiome are implicated in a wide range of health outcomes, but large gaps remain in their interpretation, reproducibility, and use to develop effective public health interventions.

Continue reading

Bisplotti: A One Stop Shop For All Your Whole-Genome DNA Methylation Sequencing Plotting Needs

Bisplotti: A One Stop Shop For All Your Whole-Genome DNA Methylation Sequencing Plotting Needs Author(s): Jacob Morrison, Benjamin K Johnson, James P Eapen, Ian Beddows, Hui Shen Affiliation(s): Van Andel Institute DNA methylation is key to normal cellular differentiation, lineage specification, and the development of diseases including cancer. It serves as one of the most widely studied epigenetic marks, due in part to its stability through most storage conditions and histological preparations.

Continue reading

BioPlex: An Integrated Data Product For The Analysis Of Human Protein-Protein Interactions

BioPlex: An Integrated Data Product For The Analysis Of Human Protein-Protein Interactions Author(s): Ludwig Geistlinger, Roger Vargas, Joshua Pan, Edward Huttlin, Robert Gentleman Affiliation(s): Center for Computational Biomedicine Summary: The BioPlex project has created two proteome-scale, cell-line-specific protein-protein interaction (PPI) networks: the first in 293T cells, including 120k interactions among 15k proteins; and the second in HCT116 cells, including 70k interactions between 10k proteins. Here, we describe programmatic access to the BioPlex PPI networks and integration with related resources from within R and Python.

Continue reading

Bioconductor Helm Chart: A Customizable Deployment Stack For Bioconductor Rstudio

Bioconductor Helm Chart: A Customizable Deployment Stack For Bioconductor Rstudio Author(s): Alexandru Mahmoud, Nuwan Goonasekera, Nitesh Turaga, Enis Afgan Affiliation(s): Harvard Medical School The Bioconductor Helm chart abstracts the mechanics of deploying the Bioconductor version of RStudio into a single, highly-configurable package. The chart follows recommended standards and is packaged with a sensible default configuration, making Bioconductor RSstudio seamlessly deployable on any Kubernetes cluster, from a local machine, to commercial and academic clouds.

Continue reading

A Resource Of Microbiome Benchmark Datasets With Biological Ground Truth

A Resource Of Microbiome Benchmark Datasets With Biological Ground Truth Author(s): Samuel David Gamboa-Tuz, Levi Waldron, Marcel Ramos Affiliation(s): CUNY Graduate School of Public Health and Health Policy Twitter: @samueldgamboa Little consensus yet exists on the best methods of differential abundance (DA) analysis in microbiome data analysis, with a range of classical statistical tests, methods adapted from the field of RNA-seq, and methods developed specifically for metagenomic data, in use.

Continue reading

A Novel Method To Identify Differentially Regulated Genes

A Novel Method To Identify Differentially Regulated Genes Author(s): Joel Meili, Mark Robinson, Simone Tiberi Affiliation(s): University of Zurich Twitter: @tiberi_simone Background: Technological developments have led to an explosion of high-throughput single cell data, which are revealing unprecedented perspectives on cell identity. Recently, significant attention has focused on investigating cellular dynamic processes, such as cell differentiation, cell (de)activation, and gene regulation. In particular, RNA velocity tools (notably velocyto and scVelo), by exploiting the abundance of spliced (mature) mRNA and unspliced (immature) pre-mRNA, have enabled inferring the RNA velocity of individual cells, i.

Continue reading

A Novel Analysis Framework To Discover Spatially Variable Genes

A Novel Analysis Framework To Discover Spatially Variable Genes Author(s): Peiying Cai, Mark Robinson, Simone Tiberi Affiliation(s): University of Zurich Twitter: @tiberi_simone Background: Spatial omics technologies allow measuring gene expression profiles while also retaining information of the spatial tissue. Some computational tools (notably SpatialDE, SPARK-X and MERINGUE) allow identifying spatially variable (SV) genes i.e., genes whose expression profiles vary across tissue. Furthermore, it is often possible to label specific areas of the tissue; for example, in brain studies, neocortex can be divided into layers based on cytoarchitecture [1], while, in cancer, pathologists’ annotation can separate melanoma, stroma and lymphoid tissues [2].

Continue reading