Stark Group Research

Systems biology of regulatory motifs and networks – towards understanding gene expression from the DNA sequence

The regulation of gene expression in response to developmental or environmental stimuli is a crucial process in all organisms. Transcription is regulated by trans-acting transcription factors that recognize cis-regulatory DNA elements (CRMs or enhancers) and function in a combinatorial fashion. Enhancers retain their activity even when placed in artificial contexts (e.g. in reporter gene assays), but the exact requirements for enhancer function, i.e. a regulatory code, remains unknown and enhancer activity cannot be predicted from the DNA sequence.

Employing an interdisciplinary approach, we use both bioinformatics- and molecular biology-based methods to achieve a systematic understanding of the structure and function of enhancers. Our goal is to “crack” the regulatory code, predict enhancer activity from the DNA sequence, and understand how transcriptional networks define cellular and developmental programs.

The regulatory code of context-specific transcription factor binding

Transcription factors are employed in different contexts, i.e. in various tissues or at different stages of development. Typically, they bind to and regulate context-specific targets that are determined by the respective enhancer sequences and transcription factor combinatorics. We use tissue-specific ChIP-Seq, bioinformatics, and machine learning to determine the sequence determinants of context-specific transcription factor binding in Drosophila, i.e. the combinations of partner motifs that determine binding in each context. We focus on transcription factor binding during embryonic mesoderm and muscle development (Zinzen et al., 2009). We find that many motifs are differentially distributed between binding sites at different stages (Figure 2) and that this differential distribution is predictive of stage-specific binding. Our results further suggest that the transcription factors vielfaltig/zelda and tramtrack are important determinants of transcription factor binding in the early embryo (Yáñez Cuna et al., submitted). We are also establishing tissue-specific ChIP-Seq in Drosophila to determine the tissue-specific targets of the circadian clock factors and homeobox (Hox) transcription factors.

Figure 2 (Click to view legend)

In vivo and in vitro enhancer screens

Figure 1 (Click to view legend)

Collections of enhancers that function similarly across cell types would be an invaluable resource to study the sequence basis of enhancer activity. The Vienna-Tiles (VT) library established in collaboration with the Dickson lab and the VDRC consists of currently about 8000 transcriptional reporter constructs integrated at a single defined genomic position in transgenic Drosophila lines, each carrying a distinct ~2kb long candidate DNA fragment. We are determining the temporal and spatial enhancer activity of these fragments in transgenic Drosophila embryos by in situ hybridization against the reporter transcript. As of today, we have assayed more than 2000 enhancer candidates and have seen an activity-rate of ~40% with diverse activity patterns throughout embryogenesis (Figure 1). Interestingly, the activity rate increases from about 10% in early to about 35% in late embryos, reflecting the increasing complexity of the embryo with increasingly many distinct tissues and cell types. In a pilot study, we have found that enhancers additively contribute to the overall expression pattern of a gene. Groups of enhancers with similar activity can be predicted based on transcription factor occupancy or the enhancers´ DNA sequences using machine learning approaches. We have also established a high-throughput screen based on next-generation sequencing (NGS) to measure enhancer activity in specific cell types, and will analyze the sequences using bioinformatics and machine learning tools.

Enhancer activity and gene expression analysis by automatic image processing

We are developing computational tools to automatically find and extract embryos from whole-mount in situ images (Figure 1) and to compare enhancer activity patterns with gene expression patterns obtained from BDGP (Tomancak et al., 2007). We have established a collaboration with the Christoph Lampert group (IST Austria) on image analysis. Clustering genes and enhancers by their spatio-temporal co-expression and intersecting transcription factor expression patterns will enable us to suggest regulatory interactions and integrate these data with sequence analyses.

Comparative genomics and the evolution of transcriptional regulation

Figure 3 (Click to view legend)

Functional elements in a genome are typically under evolutionary selection to maintain their functions in related organisms. In collaboration with the Zeitlinger group (Stowers Institute), we study in vivo transcription factor binding sites in 6 Drosophila species at various evolutionary distances from Drosophila melanogaster (Figure 3; He & Bardet et al., 2011). We find that transcription factor binding is highly conserved in species as distant from D. melanogaster as platypus or chicken from human. Conserved binding correlates with sequence motifs for Twist and its partners, permitting the de novo discovery of their combinatorial binding. It also includes more than 10,000 low-occupancy sites near the detection limit, which tend to mark enhancers of later developmental stages. We have developed computational methods to score motif conservation across different Drosophila genomes. These enabled us to discover novel motif types, as well as identify functional targets of many transcription factors and microRNAs with a high degree of certainty. Comparative genomics and related bioinformatics approaches will permit us to integrate our data and knowledge to predict developmental enhancers, regulatory targets for transcription factors, and the expression patterns of genes. They will also allow us to integrate microRNA-mediated regulation into regulatory networks and understand their role in tissue-specific expression programs.

Novel methods based on next-generation sequencing (NGS)

High-throughput next-generation sequencing has become the basis of many novel methods. We are establishing computational tools to analyze NGS data for RNA-Sequencing, RNA cross-linking and immunoprecipitation (CLIP), haploid genetic screens, and chromatinimmunoprecipitation coupled to NGS (ChIP-Seq), and are collaborating with many groups on campus and abroad.

Selected Publications

Fly comparative genomics

  • He, Q., Bardet, AF., Patton, B., Purvis, J., Johnston, J., Paulson, A., Gogol, M., Stark, A., Zeitlinger, J. (2011). High conservation of transcription factor binding and evidence for combinatorial regulation across six Drosophila species. Nat Genet. 43(5):414-20
  • Stark, A., Lin, M.F., Kheradpour, P., Pedersen, J.S., Parts, L., Carlson, J.W., Crosby, M.A., Rasmussen, M.D., Roy, S., Deoras, A.N., et al. (2007). Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 450, 219-232.

Regulation of transcription

  • Kheradpour, P., Stark, A., Roy, S., and Kellis, M. (2007). Reliable prediction of regulator targets using 12 Drosophila genomes. Genome Res 17, pp. 1919-1931.
  • Zeitlinger, J., Zinzen, R.P., Stark, A., Kellis, M., Zhang, H., Young, R.A., and Levine, M. (2007). Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes in the Drosophila embryo. Genes Dev 21, 385-390.

microRNA gene finding and target prediction

  • Stark, A., Bushati, N., Jan, C.H., Kheradpour, P., Hodges, E., Brennecke, J., Bartel, D.P., Cohen, S.M., and Kellis, M. (2008). A single Hox locus in Drosophila produces functional microRNAs from opposite DNA strands. Genes Dev 22, 8-13.
  • Brennecke, J., Stark, A., Russell, R.B., and Cohen, S.M. (2005). Principles of MicroRNA-Target Recognition. PLoS Biol 3, e85.
  • Stark, A., Brennecke, J., Bushati, N., Russell, R.B., and Cohen, S.M. (2005). Animal MicroRNAs Confer Robustness to Gene Expression and Have a Significant Impact on 3'UTR Evolution. Cell 123, 1133-1146.