Using Comparative Genomics To Predict Protein Coevolution Networks With The DECIPHER And Synextend Packages***

Using Comparative Genomics To Predict Protein Coevolution Networks With The DECIPHER And Synextend Packages***


Author(s): Aidan Hunter Lakshman, Nicholas Cooley, Erik Scott Wright

Affiliation(s): University of Pittsburgh

Twitter: @ahlakshman

In the past decade, the number of sequenced proteins with unknown functions has grown exponentially while the number of experimentally analyzed proteins has increased at a relatively constant rate. To assign functions to more proteins, many in silico methods have been produced that predict protein function purely from gene sequence data. These methods rely on a ‘guilt-by-association’ analysis to detect genes that are likely involved in common functional pathways. We have developed two Bioconductor packages with many functions for comparative genomics. This workshop will focus on one application of comparative genomics: predicting the functional association between proteins to formulate hypotheses about the unknown role of new proteins. Along the way, we will cover several topics, finding and importing genomes into a sequence database with the DECIPHER package, gene calling and annotation using the DECIPHER package, identification of clusters of orthologous genes using the SynExtend package, construction of alignments and phylogenetic trees using the new TreeLine function in the DECIPHER package, prediction of functional associations using ProtWeaver in the SynExtend package . The talk will highlight our newest functionality, ProtWeaver, which implements several methods commonly used in the literature to prediction protein functional association. We will briefly show how each algorithm works, then apply them to a real set of proteins of unknown function. This workshop will teach participants how to extract useful information from large biological sequence data with comparative genomics and predict the function of uncharacterized proteins.