Machine learning-based functional characterization of heart enhancers uncovers novel cardiogenic roles for the transcription factors Myb and Su(H). Shaad M. Ahmad1,4, Brian W. Busser1,4, Di Huang2,4, Elizabeth J. Cozart1, Anton Aboukhalil3, Sebastien Michaud3, Neal Jeffries1, Martha L. Bulyk3, Ivan Ovcharenko2, Alan M. Michelson1. 1) NHLBI, NIH, Bethesda, MD; 2) NLM, NIH, Bethesda, MD; 3) Harvard Medical School, Boston, MA; 4) Equally contributing first authors.
The development of a complex organ such as the Drosophila heart requires a network of signaling molecules and transcription factors (TFs), the combined activities of which are integrated by transcriptional enhancers. The Drosophila heart is composed of two distinct cell types, the contractile cardial cells (CCs) and the non-muscle pericardial cells (PCs). Here we combined machine learning of heart enhancer sequence features with chromatin immunoprecipitation sequencing (ChIP-seq) data for key cardiac regulators to computationally classify cell type-specific cardiac enhancers, thereby identifying related enhancers, their shared and unique sequence motifs, and novel trans acting factors which direct cell type-specific genetic programs. We initially found that addition of ChIP-seq data improves the performance of the enhancer classification. In addition, predicted cell type-specific enhancers are over-represented near the appropriate cell type-specific cardiac gene sets and are active in the heart when tested in transgenic reporter assays. Furthermore, many of the motifs learned by the classifier are recognized by TFs known to be involved in cardiogenesis, but some of the identified transcription factor binding sites (TFBSs) were novel. Within the latter category is a TFBS recognized by Myb, which we demonstrate experimentally acts in concert with the forkhead domain TF Jumeau to control cardiac progenitor cell divisions. Interestingly, machine learning revealed Suppressor of Hairless (Su(H)) TFBSs as a sequence feature that may discriminate between PCs and CCs. In agreement with this hypothesis, Su(H) was found to repress a known PC gene in CCs. We thus show that machine learning can be utilized to recognize novel TFBSs and facilitate the identification of cognate TFs and their functions during organogenesis.