Mapping the cis-regulatory landscape of early embryonic development in Drosophila with hundreds of TFs. C Blatti1, M Kazemian1, S Celniker2, M Brodsky3, S Sinha1. 1) U of Illinois, Urbana, IL; 2) LBL, Berkeley, CA; 3) U Mass Med School, Worcester, MA.

   While ModENCODE data enables the genome-wide annotation of potential regulatory elements in Drosophila, it does not generally provide their specific spatial-temporal activity pattern nor identify which transcription factors (TFs) and DNA binding sites drive those patterns. We developed a strategy to produce this type of comprehensive description of the cis-regulatory landscape by modeling TF occupancy from the binding specifies (motifs) for > 300 TFs and by examining sets of genes expressed in ~200 distinct early embryonic expression domains annotated in the BDGP in situ image database. First, we predicted each TFs genome-wide binding profile using a HMM-based motif-scanning method and stage-specific DNA accessibility data. Comparison of these profiles to data from 60 ChIP experiments revealed a high degree of agreement (avg corr coeff >0.6). Next, for each gene set from the ~200 expression domains, we searched for enrichments of predicted TF binding within the regulatory regions. This procedure generated a compendium of > 5000 significant associations between TFs and expression terms with 21% supported by the TF having the associated or a related expression pattern. For this analysis, we identified TFs and expression terms with systematic biases for regulatory regions that are gene-proximal or distal. Finally, we annotated candidate enhancers, defined as stage-specific open chromatin regions, for the likely expression pattern they drive. To predict a specific pattern from regulatory sequence, we fit a regression model incorporating information from TF binding profiles, TF expression, and our functional associations. Our model accurately recovered REDfly enhancers for 18 separate expression domains. By leveraging available comprehensive sets of TF binding specificities and gene expression patterns, we are able to systematically describe embryonic development in terms of TFs and their target regulatory sequences.