Bisplotti: A One Stop Shop For All Your Whole-Genome DNA Methylation Sequencing Plotting Needs
Author(s): Jacob Morrison, Benjamin K Johnson, James P Eapen, Ian Beddows, Hui Shen
Affiliation(s): Van Andel Institute
DNA methylation is key to normal cellular differentiation, lineage specification, and the development of diseases including cancer. It serves as one of the most widely studied epigenetic marks, due in part to its stability through most storage conditions and histological preparations. In mammals, DNA methylation generally occurs at the 5’ position of CpG dinucleotides. Several methods exist to probe methylation status, including both sequencing- and array-based methods. Short-read sequencing-based methods often utilize converting unmethylated cytosines into thymines, either through sodium bisulfite conversion (as in whole-genome bisulfite sequencing, or WGBS) or an enzymatic process (as in NEBNext Enzymatic Methyl-seq, or EM-seq) that achieves the same result. Using these conversion-based methods, a whole-genome-like sequencing technique can be performed, whole-genome DNA methylation sequencing (WGMS). As the cost for WGMS decreases, it has become increasingly popular. Aside from its comprehensive coverage, WGMS can also provide both genetic and epigenetic information, allowing one to get both sets of information in a single assay, saving both time and money. Once sequencing data has been collected, a typical pipeline for analyzing WGMS data is to align the sequenced reads to a reference genome, extract DNA methylation information from the aligned reads, then load the methylation information into R for analysis using tools available on CRAN or, most often, Bioconductor. Several tools exist for aligning WGMS data and extracting methylation information (e.g., BISCUIT and Bismark). Further, many tools exist on Bioconductor to analyze the extracted information, including biscuiteer for summarizing data and DMRcate for differentially methylated region calling. However, there is currently a lack of packages available for creating figures for DNA methylation analyses. The R package, bisplotti, was written to resolve the lack of plotting packages for DNA methylation analyses. Bisplotti readily fits into the BISCUIT/biscuiteer analysis framework, while also allowing Bismark users (through the BSseq package) and, in some instances, methylation array users (through a beta values matrix) to easily use it. Utilizing ggplot2 for plotting and an easy-to-use API, bisplotti can effortlessly produce publication-ready figures. It can produce an assortment of figures, including, but not limited to, methylation averages across many scales (multiscale plots), methylation density plots (both one- and two-dimensional) to compare distributions across samples, and DNA methylation phasing around CTCF sites. Most importantly, bisplotti also provides a framework for analyzing the epiBED format to study epistates and allele-specific methylation. Further, it generates figures to show read- or fragment-based methylation across a region of interest. In addition to many options already available, each figure is returned as a ggplot2 object, which allows for further user-defined customization. In summary, bisplotti exists to fill a gap for WGMS data visualization, as well as creating publication-ready figures for whole-genome DNA methylation sequencing data. It’s an easy-to-use R/Bioconductor package that can create plots from BISCUIT and Bismark through the bsseq object, as well as some functionality for methylation array data through of matrix of beta values. While several figures are already implemented, we are looking for community