Distance Metric Learning On The L1000 Connectivity Map
Author(s): Ian Smith,Benjamin Haibe-Kains
Affiliation(s): University Health Network
The Next Generation Connectivity Map (L1000) is a massive, high-throughput dataset measuring transcriptional changes in cancer cell lines from chemical and genetic perturbation. Applications of L1000 require computing similarities among signatures to identify similar and dissimilar perturbations. We introduce a method from the field of metric learning to learn a class of similarity functions from the data that maximizes discrimination of replicate signatures. The learned similarity function, rectified cosine, shows improved performance for identifying known biological relationships from the data. Furthermore, this approach can be generalized to other perturbational datasets. Finally, we introduce a package to simplify analysis and querying of the L1000 dataset.