Chromophobe: A Framework For Comparative And Contrastive Evaluation Of Chromatin State Models
Author(s): Lauren Marie Harmon, Connie Krawczyk, Tim Triche
Affiliation(s): Van Andel Institute
Twitter: @Lauren_M_Harmon
The ChromHMM model introduced by Ernst and colleagues has become a standard representation for chromatin state in the decade since its original publication. Recent modifications include the introduction of a stacked or per-cell-type model (Vu & Ernst, 2022) along with a repurposing of the original model to capture extremely sparse single-cell data via posterior state probabilities (Zhang & Srivastava et al., 2022). Previously, our lab implemented a framework (the `chromophobe` package) for distillation of ChromHMM states into Bioconductor data structures, enabling parallel annotation & processing of arbitrary tabixed files, such as paired-end representations of chromatin looping data. We found that contrasts of chromatin state assignments between cellular conditions (a special case of stacked HMM) and mixtures of uncertain cellular states (a special case of the mixture HMM formalism implemented in, among others, Satu & Satu 2019) offer biologically relevant insight when studying chromatin modifiers (such as histone lysine demethylases) with age-, sex-, and condition-dependent effects. ChromHMM has evolved in large part by spawning simplified representations as understanding of underlying processes improves -- as simple as possible, but no simpler. The `chromophobe` package automates many such simplifications. `chromophobe` leverages Bioconductor infrastructure for parallel processing, fast selective importation, and visualization of condition-dependent divergence in chromatin states, along with associated genomic and epigenomic signals. The GenomicSegmentation superclass adds a TrackLine with sensible defaults to the ubiquitous GRanges class; subclasses implement necessary validation conditions for more specialized plot methods which allow users to intuitively compare slices of genomic space across cells & conditions. All methods admit any data structure which can satisfy API requirements (e.g. discrete, discordant, or continuous mixture states), and most allow parallel inspection of the underlying data used to generate state representations. Basic statistical tests for divergence within states or regions are supported, primarily based on the StackedHMM class, which distills divergent regions by cell types or conditions, and the scChromHMM class, which accepts a matrix of state assignment probabilities by cell, condition, or subject. Import methods for BEDPE and arbitrary Tabix'ed input formats support focused investigations of mutations and dosage differences in chromatin modifying genes, with or without stratification by discrete or continuous conditions. With the introduction of broadly available single-cell chromatin-state data, alongside experimental and computational methods for 4D genome structure analysis, we expect that the `chromophobe` package will become increasingly useful as a tool to interpret the relationship between chromatin state and cellular fate across conditions.