Investigating context-dependent transcription factor binding in early Drosophila development. Jessica L. Stringham1, Adam S. Brown2, Robert A. Drewell2, Jacqueline M. Dresch3. 1) Computer Science Department, Harvey Mudd College, Claremont, CA; 2) Biology Department, Harvey Mudd College, Claremont, CA; 3) Mathematics Department, Harvey Mudd College, Claremont, CA.
Gene expression in the Drosophila embryo is controlled by functional interactions between protein transcription factors (TFs) and DNA cis-regulatory modules (CRMs). These interactions are mediated by the binding of TFs to specific sequences in CRMs. The binding site sequences for any TF can be experimentally determined and represented in a position-weight matrix (PWM). PWMs can then be used to predict the location of TF binding sites in other regions of the genome. Serious limitations to this approach are that often we only have a few examples of confirmed binding sites and the sites are frequently less than 10bp in length. As a result, the information content in a PWM is often less than optimal, leading to an inability to make accurate predictions.
Analysis of a large number of CRMs that control transcription of target genes along the anterio-posterior axis of the embryo reveals the presence of blocks of evolutionarily conserved sequence that extend beyond the predicted TF binding sites. In this study we are examining the function of these flanking sequences. In particular, we are using computational approaches to determine whether the flanking sequences potentially enhance the specificity of particular TF binding. Expanding PWMs to include context-dependent transcription factor binding will allow us to functionally dissect CRMs and significantly expand the information content in PWMs.