Transcriptional Regulation of Gene Expression in C. Elegans

Protein coding gene sequences are converted to mRNA by the highly regulated process of transcription. The precise temporal and spatial control of transcription for many genes is an essential part of development in metazoans. Thus, understanding the molecular mechanisms underlying transcriptional control is essential to understanding cell fate determination during embryogenesis, post-embryonic development, many environmental interactions, and disease-related processes. Studies of transcriptional regulation in C. elegans exploit its genomic simplicity and physical characteristics to define regulatory events with single-cell and minute-time-scale resolution. When combined with the genetics of the system, C. elegans offers a unique and powerful vantage point from which to study how chromatin-associated proteins and their modifications interact with transcription factors and their binding sites to yield precise control of gene expression through transcriptional regulation.


Overview
Every aspect of cellular function depends on the gene products expressed in that cell.The mechanisms regulating the expression of these gene products are diverse, and they can affect each of the steps necessary to make and maintain a steady-state level of functional RNA or protein.These mechanisms include those controlling RNA synthesis, processing, and stability, and, in the case of protein coding genes, protein translation, modification and degradation.Here we focus on the regulation of RNA transcription in C. elegans.Transcription is a necessary first step in gene expression, and transcriptional regulation plays a central role in organismal development and evolution (Levine and Tjian, 2003;Chen and Rajewsky 2007).Indeed, the number of specific proteins involved in transcription and its regulation increases with increasing organismal complexity (Vogel and Chothia, 2006).
Many of the general characteristics that make C. elegans an excellent model system (invariant lineage, simple anatomy, effectiveness of RNAi, etc.) also make it an excellent system to study transcriptional regulation.However, two specific characteristics uniquely facilitate transcriptional regulation studies in the worm.First, C. elegans is transparent throughout its entire life cycle making it an ideal system to use fluorescent protein reporter genes to monitor gene expression in live animals with single-cell resolution.Second, the relatively compact size of the C. elegans genome facilitates identification of cis-acting transcriptional regulatory elements (CREs) controlling gene expression.Examples of the success of studying transcriptional control in C. elegans are reviewed in WormBook and elsewhere, and they include the elucidation of the transcriptional cascade controlling specification and differentiation of the gut, nervous system, and pharynx (see The C. elegans intestine; Neurogenesis in the nematode Caenorhabditis elegans; and The C. elegans pharynx: a model for organogenesis).These two physical characteristics are augmented by facile forward and reverse genetics in the C. elegans system.Mutations affecting numerous transcription factors and even CREs have provided important insights into the transcriptional control of cell fate decisions during development, such as specifying which cells undergo apoptosis (Conradt and Xue, 2005), while characterization of dosage compensation (Meyer, 2010) and the synMuv mutants (Fay and Yochem, 2007) have broadened our general understanding gene regulation at the chromatin level.These advantages have consistently provided novel mechanistic insights into the transcription regulation of gene expression.
This chapter provides an overview of RNA Pol II transcription in C. elegans, focusing on what we have learned to date about gene expression in the somatic cells of the animal.Although much of the content of this review is also applicable to germline transcription, readers interested in germline gene expression are encouraged to see WormBook chapters Germline genomics and Germline chromatin, as important differences exist in the mechanisms controlling expression in these two tissue types.This chapter takes a broad-strokes approach to somatic transcription while providing references to serve as entry points for those wanting to explore particular topics in more detail.A brief introduction into the basics of gene organization and regulation sets the stage for those unfamiliar with C. elegans gene expression.This is followed by a discussion of transcription factor function, and the chapter ends with the known roles for chromatin in C. elegans gene regulation.Although some basic information may duplicate that found in previous reviews of transcriptional regulation (Transcriptional regulation;Transcription mechanisms;Gaudet and McGhee, 2010;Krause, 1995;McGhee and Krause, 1997;Van Nostrand and Kim, 2011), this review aims to augment rather than supersede these alternate overviews as each has valuable information and a unique perspective on transcriptional regulation.
Transcriptional regulation of gene expression in C. elegans

RNA Polymerase II and associated factors
The regulation of RNA Polymerase II (Pol II)-mediated transcription in C. elegans can be described as typical for eucaryotes.Pol II appears to act in concert with TATA Binding Protein (TBP) and TBP-Associated Factors (TAFs) at the core promoter of protein coding genes (Dantonel et al., 2000;Kaltenbach et al., 2000;Lichtsteiner and Tjian, 1993;Walker et al., 2004; see also Transcription mechanisms).As in other eucaryotyes, the large subunit of Pol II of C. elegans, encoded by the ama-1 gene (Bird and Riddle, 1989), has an extended C-terminal domain (CTD) that likely serves as a binding site for protein complexes involved in co-transcriptional mRNA processing and histone modification (Figure 1; from Phatnani and Greenleaf, 2006).Active Pol II is phosphorylated on the CTD at serine 2 and 5 (Ser2P or Ser5P) of the conserved heptad repeat (YSTPSPS) and its variants, as it is in other eucaryotes (Seydoux and Dunn, 1997;Wallenfang and Seydoux, 2002;Zhang et al., 2003).The levels of Ser2P increase while the Ser5P levels decrease with transcriptional elongation in eucaryotes (Buratowski, 2009) and there is some evidence for the same in C. elegans (Garrido-Lecca and Blumenthal, 2010).Interestingly, many of the antibodies used to distinguish these Pol II isoforms based on heptad repeat phosphorylation epitopes yield very similar patterns by chromatin immunoprecipitation (ChIP) in C. elegans (Baugh et al., 2009;Pferdehirt et al., 2011).The functions of many of the other core transcription factors are similarly conserved with one of the only major differences being the absence of the negative elongation factor NELF (reviewed in Transcription mechanisms).

C. elegans gene organization and regulation
Transcriptional regulation results from a complex organization of cis-acting sequences that serve as binding sites for a multitude of trans-acting factors that together determine if a gene will be active or silent.In higher eucaryotes, these cis-acting sequences are typically clustered into discrete functional modules, including the core promoter, extended proximal and downstream promoter regions, positive and negative enhancers, and insulators as diagramed in Figure 2 ( Levine and Tjian, 2003).

Proximal control regions
The majority of protein-coding genes in C. elegans are within relatively gene-dense regions of the genome.Consequently, cis-acting regulatory regions are usually close to the coding region.In fact, a good rule of thumb for C. elegans is that the minimal set of cis-acting sequences sufficient to regulate proper gene expression is found within 2 kb upstream of the translational start codon.Often, another gene is present on the same or opposite strand and located less than 2 kb upstream of the gene of interest.Generally in these cases, one assumes the minimal promoter is restricted to the non-coding, intergenic region.There are notable exceptions to this compact view of cis-acting sequences.For example, egl-1 expression is controlled, in part, by elements located both upstream and more than 4 kb downstream of the coding region (Thellmann et al., 2003;Winn et al., 2011).For lin-39, proper reporter gene expression required inclusion of ∼30 kb of genomic DNA that extends long distances upstream and downstream of the protein coding region (Wagmaister et al., 2006).Clearly C. elegans genes can have complex and distant control regions so the 2 kb rule of thumb should not be mistaken for dogma.
It is important to remember that the minimal promoter region is not synonymous with the natural promoter.The natural promoter may span a much larger region due to redundancy in the function of regulatory elements that ensure proper and robust regulation of the endogenous gene.One common site of additional control elements is within the introns.Most C. elegans introns are small (e.g., <100 bp; see Overview of gene structure) and are thus unlikely to contain elements controlling expression.However, introns larger than several hundred base pairs do often have such elements (e.g., Nam et al., 2002;Okkema et al., 1993;Kostrouchova et al., 1998).Therefore, intron size can provide a clue in searching for transcriptional control sequences.

Distal control regions
The relatively compact C. elegans genome may also underlie the apparent absence of long-range control mechanisms for gene regulation that are common to other metazoa.For example, CTCF in vertebrates and flies plays a key role in long-range chromatin organization and can block enhancer-stimulated gene expression, thus functioning as an insulator as shown in Figure 2 (Wallace and Felsenfeld, 2007).To date, there is no evidence for an ortholog or functional equivalent of CTCF in C. elegans.Thus, both the local and global organization of genes in C. elegans appears to be relatively simple in comparison with other metazoans, presumably simplifying our understanding of transcriptional regulation.

Transcriptional initiation and trans-splicing
Another unusual aspect of the C. elegans system to note when considering transcriptional regulation is the occurrence of trans-splicing, a process that replaces the initial transcript 5' untranslated region (5' UTR) with a 22 nucleotide leader sequence (SL1) for the majority of Pol II messages (Allen et al., 2011;Krause and Hirsh, 1987;Trans-splicing and operons in C. elegans).Therefore, mapping the 5' start site of the mature mRNA often only reveals the site of trans-splicing, not transcriptional initiation, complicating analyses that are commonplace in other systems.The ability to trans-splice messages also provides a processing mechanism for polycistronic messages, or operons, in C. elegans (Spieth et al., 1993;reviewed in Trans-splicing and operons in C. elegans).The sequence of the spliced leader present on the mature message distinguishes the first gene of an operon (no SL or SL1 trans-spliced) versus internal messages of the operon (SL2 trans-spliced).Typically, the tightly clustered genes of an operon are co-regulated due to their polycistronic nature, although the use of different promoters upstream and within operons can result in independent transcriptional regulation of one or more of the mRNAs (Allen et al., 2011;Huang et al., 2007;Yin et al., 2010;Morton and Blumenthal, 2011;Trans-splicing and operons in C. elegans).

Core promoter elements
Ironically, the study of common sequences among messages trans-spliced provided the first systematic information on common promoter sequences (Graber et al., 2007), revealing the presence of a consensus Kozak sequence regulating translational initiation (Kozak, 1981).A more recent study of promoter sequences has extended this analysis to define the core promoter sequence elements typically present in C. elegans (Grishkevich et al., 2011) 3).The sensitivity and depth of coverage of next-generation sequencing-based techniques will allow fine scale mapping of primary transcript start sites of outron-containing genes in the near future, aiding in our understanding of transcriptional initiation and requisite regulatory sequences in the near future.

Promoter complexity
Two general types of promoter organization have been described in C. elegans, simple and complex.A simple promoter is defined here as one in which the cis-acting control elements necessary for proper expression are confined to a small region (a few hundreds of bp) of the genome.Housekeeping genes expressed in all tissues are good candidates for regulation by simple promoters, although few housekeeping genes in C. elegans have been well characterized.Among the best-characterized simple promoters are those of the hsp-16 family of genes.This family consists of pairs of divergently transcribed genes with promoter regions sufficient for heat-regulated expression contained within the short (∼350 bp) intergenic regions (Jones et al., 1986 Russnak andCandido, 1985;Stringham et al., 1992).Despite these compact promoters, distinct tissue expression patterns are induced from different hsp-16 promoters (Stringham et al., 1992), suggesting the presence of multiple regulatory sites within these simple promoters.Another excellent example of simple promoters are in the vitellogenin (vit) genes, which exhibit stage-, tissue-and sex-specific expression controlled, in the case of vit-2, by a 247 bp promoter (MacMorris et al., 1992; Transcriptional regulation of gene expression in C. elegans MacMorris et al., 1994).vit-2 promoter activity depends on GATA-factor binding sites and a novel VPE2 site (TGTCAAT) conserved in vit gene promoters in C. elegans and C. briggsae (Spieth et al., 1985;Zucker-Aprison and Blumenthal, 1989).Certain cell cycle promoters have also been shown to be remarkably simple.Analysis of several genes expressed only in proliferative cells and encoding G1 phase regulators (e.g., cyclin D) revealed that proper regulation required as little as 67 base pairs from the promoter (Brodigan et al., 2003;Park and Krause, 1999).
In contrast to the simple promoters, complex promoters contain dispersed regulatory elements in which the overall pattern of gene expression is the result of the composite action of several dispersed elements, each influencing or contributing to the overall expression pattern.Complex promoters are often associated with regulatory genes controlling a key developmental decision.For example, the piecemeal organization of the regulatory regions for hlh-1 and lin-26 reflect a need for these promoters to integrate a variety of different cell lineage inputs to control proper cell fate specification in the correct cells at the appropriate time during development (see previous version of this chapter, Trancriptional regulation).

Transcription factors
Proper spatial and temporal regulation of gene expression depends on the binding of transcription factors to specific gene cis-regulatory sequences (Levine and Tjian, 2003).A variety of C. elegans transcription factors are well characterized and have important developmental roles; however it remains a challenge to accurately catalog all the transcription factors encoded in the C. elegans genome.Automated searches for gene ontology terms associated with transcriptional regulation can result in inclusion of false positive hits for transcription factors (Reece-Hoyes et al., 2005;Vaquerizas et al., 2009).In some cases there is ambiguity as to whether a particular domain defines the protein as a DNA-binding factor.For example, various types of zinc finger domains can bind DNA but can also serve other functions, including RNA binding and protein-protein interactions (Gamsjaeger et al., 2007;Matthews and Sunde, 2002).Likewise, for factors that modify chromatin or participate in a transcription complex, the definition of a transcription factor often lies in the eyes of the investigator.Finally, gene annotations change as new information regarding gene structure is obtained and new classes of DNA-binding transcription factors are discovered.Thus any list of transcription factors must be manually curated and periodically updated to include the latest gene annotations.

Transcription factor resources
Several groups have produced excellent catalogs of transcription factors in C. elegans using predictions based on gene ontology terms associated with transcription and DNA-binding domain assignments, followed by manual curation, to produce lists of transcription factor genes containing between 934 and 988 genes (Barrasa et al., 2007;Haerty et al., 2008;Reece-Hoyes et al., 2005;Wilson et al., 2008;Reece-Hoyes et al., 2011).While these lists are largely overlapping, they are not identical, so it may be useful to scan each of these lists for your favorite genes (http://edgedb.umassmed.edu/;http://www.macwormlab.net/ntfdb/index.php;http://www.transcriptionfactor.org/ index.cgi?Home).
The most recent catalog is wTF2.2(Reece-Hoyes et al., 2011), and we have updated and annotated this list with current gene names (link to Table 1.Catalog of transcription factors.).Fifty-three transcription factor genes were removed from wTF2.2 either due to reannotation of the genome, or because there was only weak evidence that motifs in these genes encoded DNA-binding domains [e.g., ZF -A20 (IPR002653), ZF -CCCH (IPR000571), ZF -DHHC (IPR001594), and ZF -MIZ(IPR004181)].We have also added forty transcription factor genes that had been identified as DNA binding proteins in large scale yeast one-hybrid screens (Deplancke et al., 2006;Reece-Hoyes et al., 2011), or based on information from other transcription factor catalogs (Haerty et al., 2008;Wilson et al., 2008).Our catalog contains 924 transcription factor genes, which is ∼4.6% of all protein coding genes (WS220).This number is slightly less than the frequency of transcription factor genes in the human genome (∼6% of the protein coding genes) (Vaquerizas et al., 2009).Commercially available feeding RNAi clones targeting many of these genes are available (Table 1), facilitating functional analysis of C. elegans transcription factors.

Transcription factor families
C. elegans contains representatives of most major transcription factor families found in other animals (Tables 1 and 2), and 344 (37%) of the C. elegans transcription factor genes have been matched with clear human orthologs by reciprocal BLAST analysis or using orthology prediction programs (Table 1) (Reece-Hoyes et al., 2005;Shaye and Greenwald, 2011).Interestingly, some families of DNA-binding domains seem more highly conserved during Transcriptional regulation of gene expression in C. elegans animal evolution than others (Figure 4).For example, 19 out of 19 (100%) MYB-like factors and 30 out of 41 bHLH factors (73%) have human orthologs.In comparison, only 6 out of 22 T-box genes (28%) and 0 out of 9 MADF-family genes (0%) have been conserved.As has been noted previously (Reece-Hoyes et al., 2005;Sluder et al., 1999), the number of likely nuclear hormone receptors (NHRs) has expanded greatly in C. elegans (272 members in Table 2) relative to humans (46 members; IPR001628; (Vaquerizas et al., 2009)).

Interspecies comparisons
Genome sequencing of additional Caenorhabditis species allows a comparison of the C. elegans transcription factor gene family to those in other species (Haerty et al., 2008).C. briggsae and C. remanei contain similar numbers of transcription factor genes to C. elegans, and approximately 72% of the C. elegans transcription factor genes have detectable orthologs in both C. briggsae and C. remanei.This proportion of orthology is higher than that found overall for protein coding genes, suggesting transcription factor genes are under strong selective pressure.Transcription factor genes are not uniformly distributed on the chromosomes in C. elegans or C. briggsae, and many genes are located in clusters that are enriched for transcription factor genes compared to non-transcription factor genes.Furthermore members of gene families such as NHR, HOX, and T-box are frequently clustered in tandem arrays (Haerty et al., 2008).
Transcriptional regulation of gene expression in C. elegans

Transcription factor targets
A major goal in studying transcription is to make the link between transcription factors and their target genes.These links have traditionally been made by identifying binding sites in experimentally verified targets of transcription factors by detailed promoter analyses.While this approach is still valuable, more recent techniques such as PCR based binding site selection, microarray analyses, yeast one-hybrid screens, and chromatin immunoprecipitation (ChIP) assays have expanded our ability to identify transcription factor binding sites and candidate target genes on a genome-wide scale (e.g., Deplancke et al., 2006;McElwee et al., 2003;Niu et al., 2011;Zhong et al., 2010).Our knowledge of transcription factor binding site specificity will continue to increase, but we provide references to find information about DNA binding site specificity and potential target genes for some C. elegans transcription factors (link to Table 3. Transcription factor targets.).As described more fully below (see Section 7), data from many of these ChIP analyses is available at modENCODE and through WormBase.Likewise candidate transcription factor binding sites predicted from published data and user generated position weight matrices can be visualized in the Genome Browser at WormBase and at modENCODE by accessing the Sequence Motif track.

Transcription factor gene expression
Transcription factor gene expression is often highly regulated, and understanding the spatial and temporal expression patterns of transcription factors is a key to determining their function.While a variety of transcription factors have been characterized individually using reporter genes, antibodies and in situ hybridizations (e.g., hlh-1, pha-4, ceh-22, tbx-37, unc-86, cnd-1, end-3) (Finney and Ruvkun, 1990;Good et al., 2004;Hallam et al., 2000;Horner et al., 1998;Kalb et al., 1998;Krause et al., 1990;Maduro et al., 2007;Okkema and Fire, 1994), high throughput techniques such as cell type-specific microarrays and SAGE analyses are providing genome-wide gene expression data (Fox et al., 2005;McKay et al., 2003;Meissner et al., 2009;Roy et al., 2002;Spencer et al., 2011;Von Stetina et al., 2007;Zhang et al., 2002).This useful data is available for transcription factor genes (and all other genes) online through WormBase and modENCODE, while additional analyses regarding tissue specificity are available at http://www.vanderbilt.edu/wormdoc/wormmap/Welcome.html(Spencer et al., 2011).In addition, automated analyses of fluorescent protein reporter gene expression are accelerating our knowledge of transcription factor gene expression during embryogenesis and in L1 larvae (Liu et al., 2009;Murray et al., 2008) (Figure 5).Lineage-based gene expression data generated using these high throughput approaches can be accessed online through EPIC and WormDB.This data is useful to investigators interested in specific transcription factors, but it also opens the exciting possibility of using computational approaches to overlay and identify correlations between gene expression patterns to understand how networks of transcription factors control development (Figure 5).

Post-transcriptional regulation of transcription factor function
Covalent post-translational modifications also play important roles in regulating transcription factor function (Tootle and Rebay, 2005), and a variety of C. elegans transcription factors are regulated by modifications including phosphorylation, proteolysis, ubiquitination, and SUMOylation.For example, nuclear localization and DNA-binding activity of the FoxO-family factor DAF-16 is negatively regulated by the DAF-2/insulin like signaling pathway through phosphorylation at phylogenetically conserved sites (Cahill et al., 2001;Lin et al., 2001), whereas DAF-16 protein stability is regulated by ubiquitination (Li et al., 2007).Likewise the Gli family factor TRA-1 activity is regulated by specific proteolytic cleavage and by ubiquitin mediated targeting to the proteosomal degradation pathway (Schvarzstein and Spence, 2006;Starostina et al., 2007).Both the ETS-family factor LIN-1 and the T-box factor TBX-2 require SUMOylation (Leight et al., 2005;Roy Chowdhuri et al., 2006), and at least 20 other transcription factors have been identified as possible SUMOylation targets (Kaminsky et al., 2009).SUMOylation of LIN-1 promotes interaction with MEP-1, a component of the NuRD (see Section 6.1) transcriptional repression complex (Leight et al., 2005).Transcription factor SUMOylation has also been shown to recruit Drosophila MEP-1 and the NuRD complex, suggesting a conserved mechanism for SUMOylation dependent transcriptional repressors (Stielow et al., 2008).
Non-covalent interactions of transcriptional co-activators or co-repressors with DNA-bound transcription factors are also crucial for transcription factor function.The transcriptional co-activator p300/CBP is important for embryonic development (Shi and Mello, 1998), while the co-repressors CtBP and SIR-2.1 play important roles regulating life span (Chen et al., 2009;Tissenbaum and Guarente, 2001).The best-studied transcriptional co-repressor in C. elegans is the Groucho-family factor UNC-37, which was first characterized based on its interaction with the homeodomain factor UNC-4 (Winnier et al., 1999).Not surprisingly, UNC-37/Groucho has also been shown to function with a variety of transcription factors in C. elegans, including POP-1, COG-1, REF-1, RNT-1 UNC-30, and MLS-1 (Calvo et al., 2001;Chang et al., 2003;Miller and Okkema, 2011;Neves and Priess, 2005;Peden et al., 2007;Xia et al., 2007).The number of transcription factors that function as UNC-37 dependent repressors is likely to be larger, as potential Groucho-interaction motifs are found in many C. elegans transcription factors (Copley, 2005).

Chromatin status and transcription
The initiation, elongation, and termination of transcription is influenced by both local and chromosome-wide chromatin configuration, and vice versa.Many excellent reviews provide in depth treatments of this topic (Berger, 2010;Yun et al., 2011;and Bannister and Kouzarides, 2011).Here, basic information that is generally applicable across species is briefly summarized prior to a discussion of chromatin regulatory complexes in C. elegans.
Nucleosomes, around which DNA is wound, are composed of a histone protein octamer (two each of H2A, H2B, H3, and H4) that can be post-translationally modified in a variety of ways.Typically, residues on the amino terminal tails of individual histones are modified by phosphorylation, methylation, acetylation, and ubiquitination.

Transcriptional regulation of gene expression in C. elegans
The specific amino acid residue of the histone protein that is targeted, the type of modification, and the location of the nucleosome relative to the gene body, can all have effects on transcription, or be altered by transcriptional events.These effects include altering nucleosome density and changing the level of chromatin compaction to either relax or condense a region, which has been predicted to facilitate or prohibit association of different transcriptional regulatory complexes.Additionally, histone modifications can provide specific recruitment sites for different transcriptional regulatory complexes.Moreover, the number of methyl groups modifying a particular residue can have distinct effects on gene expression.As one example, monomethylation of lysine 20 of histone H4 affects transcription, while dimethylation is associated with DNA repair, and trimethylation facilitates heterochromatin formation (Balakrishnan and Milavetz, 2010).Finally, protein variants of histones, such as H2A.z or CENPA, can be incorporated into the core histone octamer, with differential effects on chromatin configuration and function as well, through regulating noncoding RNA transcription in centromeric regions (reviewed in Stimpson and Sullivan, 2010).
Over the past several years, the role of histone modifications, histone variants, and histone modifying enzymes in regulating gene expression during C. elegans development has become clearer.Genome-wide studies of the distribution of individual chromatin marks provide a glimpse into the complex combinatorial "codes" that are possible and that are associated with gene expression (Figure 6).Core chromatin regulatory complexes that have been studied in other organisms, such as NuRD (Section 6.1), MLL/COMPASS (Section 6.2), and PcG (Section 6.2), are also present in recognizable form in C. elegans.An extensive review of germline chromatin is available as a separate chapter in WormBook (Germline chromatin), so here we focus on somatic functions of these chromatin regulatory complexes.

The Polycomb Group complex
The Polycomb Group (PcG) of chromatin regulators were first uncovered in the classic studies of Edward B. Lewis in Drosophila because of their critical roles in maintaining the repressed state of homeotic (Hox) genes regulating segmentation (reviewed in Muller and Kassis, 2006).Subsequent work by numerous groups over the years revealed that the PcG includes two types of complexes, PRC1 and PRC2.A major role for mammalian PRC2 is repression of Hox gene expression during development by promoting histone H3 lysine 27 methylation, a mark that is then bound by the silencing complex PRC1, which ubiquitylates histone H2A.The mechanism by which this modification leads to Hox gene silencing is not clear.In C. elegans, the PcG-related components were first identified in maternal effect sterile (MES) screens due to their role in the germline (Capowski et al., 1991), a topic explored in detail in the WormBook chapter Specification of the germ line.In the C. elegans soma, at least some PcG-related proteins of the PRC2 complex (MES-2 and MES-6) appear to have a role in regulating Hox gene expression, as is observed in other animals (Deng et al., 2007;Ross and Zarkower, 2003).Moreover, MES-2 is required for restricting the developmental plasticity of embryos through global changes to the chromatin state (Yuzyuk et al., 2009).Until recently, it was less clear whether C. elegans contained a PRC1-like complex.However, certain PRC1-related components are present including MIG-32/Bmi-1 and SPAT-3/Ring1B, which were shown to be required for H2A ubiquitylation in the soma (Karakuzu et al., 2009).Moreover, mig-32 and spat-3 mutants have somatic defects similar to mes-2, suggesting that they act to regulate the same genes.Intriguingly, however, mig-32 and spat-3 do not share the germline defects of mes-2, suggesting that the mechanism of PRC2 activity in the germline is distinct from that of the soma (Karakuzu et al., 2009).

The SynMuv pathway
Genetic studies in C. elegans have also uncovered a series of intertwined, genetically-linked pathways involving chromatin modifications that affect developmental pathways called the SynMuv pathway (reviewed in Fay and Yochem, 2007).Consisting of at least three genetically defined groups (A, B, and C), the SynMuv groups reveal the functional redundancy underlying gene regulation (Figure 7).Under standard conditions, genes from at least two of the SynMuv groups must be disrupted by mutation before significant effects in development occur, typically and historically read out as defects in vulva formation in the hermaphrodite.All three groups contain genes that encode gene regulatory proteins, although the SynMuv A group is unique to C. elegans (Davison et al., 2011).SynMuv C genes encode proteins of the TipA/HAT regulatory complex that is associated with H3K4 acetylation and active gene expression (Ceol and Horvitz, 2004).The best-studied pathway, SynMuv B, includes members of the DRM complex (Harrison et al., 2006), which is related to a repressor complex called dREAM in Drosophila (Korenjak et al., 2004) and DREAM/Myb-MuvB/LINC in mammals (Knight et al., 2009).In C. elegans, DRM includes the sequence-specific binding factor E2F called EFL-1, a pocket protein called LIN-35 and other factors with less well-understood functions, but it does not contain a Myb-like protein as in other organisms (Harrison et al., 2006).DRM potentially acts with the deacetylase NuRD complex to promote transcriptional repression, and indeed microarray studies of mutants in the SynMuv B group primarily show increased abundance of target transcripts (Kirienko and Fay, 2007).DRM components are widely expressed and appear to play diverse roles in many tissues, although in most cases the function of the DRM complex is not solely essential for normal development.However, a second mutation that disrupts a tissue-specific regulatory protein along with a mutation in a DRM complex member can cause a tissue-specific phenotype.For instance, mutation of the C2H2 zinc finger gene slr-2 in conjunction with lin-35 mutations disrupts intestinal function, while either mutation alone does not (Kirienko et al., 2008).Additionally, a recent report indicates that one SynMuvB component, LIN-61, which binds methylated lysine 9 of H3 (Koester-Eiserfunke and Fischle, 2011), can act as part of the DRM complex in vulval development but not in Transcriptional regulation of gene expression in C. elegans other processes, showing that this complex has tissue-specificity (Harrison et al., 2007).Finally, environmental influences such as temperature can have an effect as well.lin-35, and certain other components of the DRM complex, are required to suppress the germline fate in somatic tissues (Wang et al., 2005), but only at higher temperatures is the onset of the germline fate sufficiently severe to lead to a larval arrest (Petrella et al., 2011).As in other systems, the landscape of chromatin modifications and the complexes carrying them out are generally conserved in C. elegans.Thus, it is likely that information from any one experimental system will inform our general understanding of chromatin influences in all systems.In the future investigating how these different chromatin regulatory complexes interact at common target loci, and how they influence key sequence-specific transcription factors and the core transcriptional machinery, will be critical for understanding gene regulatory mechanisms.The molecular and genetic advantages of C. elegans, combined with a relatively simple and defined cell lineage, suggest that the worm will prove particularly important for understanding the role of chromatin in developmental processes.

Systematic genome-scale analysis of transcription regulation
Genome-wide analyses complement detailed single-gene studies by providing a global overview that can be used to determine how broadly observations of transcriptional regulatory mechanisms at individual genes are applicable.Over the last decade microarray analysis of gene expression has been widely used to examine gene expression changes upon perturbation of various transcriptional components.However, one limitation to this type of analysis is the inability to distinguish direct from indirect effects.The more recent development of techniques such as chromatin immunoprecipitation (ChIP), which maps the binding events of a given factor throughout the genome, can overcome this limitation and provide important information about the direct activity of the factor.Chromatin fragments that immunoprecipitate with chromatin or transcriptional regulatory proteins are identified by either hybridization to a microarray (ChIP-chip) or by deep sequencing (ChIP-seq).This approach has proved to be a very powerful tool for investigating and discovering transcriptional mechanisms.In C. elegans, ChIP studies have been performed by individual labs focused on particular processes or factors including dosage compensation components (Ercan et al., 2007;Jans et al., 2009), the DRM complex component LIN-54 (Tabuchi et al., 2011), the histone variant HTZ-1 (Whittle et al., 2008), and transcription factors such as HLH-1 and NFI-1 (Lei et al., 2010;Whittle et al., 2009).

The modENCODE project
In addition to individual efforts, a large-scale effort by a multi-lab consortium has systematically utilized a genomics approach to exploring C. elegans gene expression.The modENCODE consortium (model organism Encyclopedia of DNA Elements), funded by the National Human Genome Research Institute (NHGRI), has in the last few years produced a wealth of genome-wide C. elegans datasets.These studies explore many aspects of transcriptional regulation such as transcription factor binding sites, chromatin modifications, and gene expression analysis of diverse RNAs, including small noncoding RNAs in addition to polyadenylated RNAs (Gerstein et al., 2010).A similar effort to analyze the Drosophila melanogaster genome is ongoing as well (Roy et al., 2010).The ultimate goal of these projects is to identify, as comprehensively as possible, all of the functional elements encoded in the DNA that are responsible for the regulation and formation of that organism.
A major effort of modENCODE is to determine the binding sites of sequence-specific transcription factors genome wide.These elements direct the temporal and spatial control of transcription, which in turn dictates an organism's development, physiology and response to the environment.Identifying these elements provides an important first step toward understanding how DNA sequence is interpreted to form a three-dimensional body plan.At the beginning of the modENCODE project, almost nothing was known about the direct targets of transcription factors in C. elegans.By the time the project completed its fifth year in March 2012, the genome-wide binding profiles of over 120 factors from diverse families of transcriptional regulators were collected and released to the C. elegans community.

modENCODE transcription factor ChIP studies
To systematically identify binding sites, transgenic lines expressing GFP-tagged transcription factors were subjected to ChIP-seq using an antibody to GFP (Zhong et al., 2010).The GFP expression patterns of these lines largely recapitulated known endogenous expression patterns, and all lines that were tested robustly rescued the mutant phenotype of that gene, indicating that the tagged transcription factors retain wild-type function.All of the detailed protocols and datasets are freely available at http://www.modencode.org/.Moreover, all the strains for which successful datasets have been produced are available in the Caenorhabditis Genetics Center (CGC).So far, 77 completed datasets are available representing 46 transcription factors, some of which have been analyzed at multiple developmental stages (Table 4).This pipeline was first utilized on the FoxA transcription factor PHA-4 that has important roles in both organ development and environmental responses (Zhong et al., 2010).Subsequently, an analysis of the major characteristics of 22 transcription factors describes binding site features and correlations for this larger set, including a preliminary regulatory network analysis (Niu et al., 2011).Most of these factors bind thousands of sites in the genome, and the majority of these binding sites near coding genes lie within 500 bp of the predicted transcript start site.A significant insight from the properties of these 22 transcription factors was the recognition that the genome Transcriptional regulation of gene expression in C. elegans contains hundreds of regions that are broadly permissive for non-specific binding by transcription factors (termed high-occupancy target, or "HOT", regions) (Gerstein et al., 2010).Recruitment to a HOT region does not require the sequence-specific binding property of the transcription factor, nor is binding correlated with the regulation of nearby genes.How transcription factors are recruited to HOT sites, and the possible role of chromatin or nuclear organization in this process, is unknown.For those interested in using these data to understand gene regulation by a specific transcription factor, HOT sites should be carefully distinguished from other binding sites that are either unique to or primarily bound by the factor of interest, which are more likely to result in direct regulatory events.Another limitation users should realize when interpreting the genome-wide transcription factor binding profiles is that almost all the experiments were performed in whole animals.Any binding profile for a broadly-expressed transcription factor will therefore be an amalgam of binding in multiple tissues.Currently, tissue-specific profiling techniques are being applied to circumvent this problem.

modENCODE chromatin modification ChIP studies
In addition to sequence-specific transcription factors, a second modENCODE project has focused on collecting genome-wide ChIP-chip datasets for various chromatin-associated factors as well as for histone modifications.Over 20 histone modifications and 14 factors have been analyzed to date.Unlike the transcription factor studies that analyzed transgenic GFP-tagged proteins these datasets were generated by using antibodies specific to each factor or modification to monitor their endogenous distribution; many of these antibodies are commercially available.This global analysis of chromatin states has yielded various insights, including the persistence of the germline-established chromatin state in the somatic tissues, highlighted by the existence of chromosomal domains enriched for repressive histone modifications that correlate with increased meiotic recombination rates (Gerstein et al., 2010).Additionally, the X chromosome exhibits several distinctive features relative to autosomes, including increased monomethylation of H4K20 and H3K27, and increased nucleosome density (Ercan et al., 2011;Liu et al., 2011).Importantly, when the chromatin modification status was combined with transcription factor data in predictive algorithms for gene regulatory events, the accuracy of the resulting predictions was greatly improved compared to either alone (Gerstein et al., 2010).

modENCODE transcription studies
All of these ChIP-based studies are complemented by modENCODE-based analysis of gene expression at many different developmental stages, tissues, and environmental conditions through the use of deep sequencing and tiling microarrays to monitor transcript identity and abundance.As of October 2011, more than 130 experiments have been analyzed and released to the community.These include analysis of both poly-A-selected RNAs as well as small RNAs.Most of these data have been primarily analyzed with the immediate goal of improving gene annotation, and have led to the identification of thousands of novel exons and splicing events, new small RNAs, including many microRNAs, and improved 5' and 3' UTR mapping (Hillier et al., 2009;Kato et al., 2009;Mangone et al., 2010;see also Jan et al., 2011).Improved gene models lead to improved assignment of regulatory events to the correct target genes.Moreover, as additional gene expression data is collected and analyzed, more conclusions will be drawn regarding correlations between regulatory factors and changes in gene expression levels across stages, tissues, and conditions.
Countless biological discoveries are embedded in these deep and complex modENCODE datasets.The published global observations and analyses (Gerstein et al., 2010) are just the beginning.Investigation of the data by the larger research community, with their specialized expertise in so many aspects of C. elegans biology, is essential to plumb the full possibilities of the data.To facilitate such endeavors by the C. elegans community, all modENCODE data, along with detailed descriptions of growth and collection conditions and protocols, are available at http://www.modencode.org/.The data are available for bulk download for large-scale analyses and comparisons, but the data for individual experiments or individual genes can be examined as well, for those with a particular focus on a single pathway or process.Data of interest can be selected as "tracks" for viewing on a genome browser as well.In the near future, additional changes will be made to the interface to improve selection and analysis of all interested users.Ultimately, all the data on the modENCODE website will be incorporated into WormBase.Movement of these data to cloud-based storage to increase accessibility and facilitate downloads is also a likely possibility in the near future.

Future Prospects
There is little doubt that the field of transcriptional regulation in C. elegans is in the midst of an information explosion.We are rapidly acquiring information concerning temporal and spatial patterns of gene expression using Transcriptional regulation of gene expression in C. elegans genome wide expression assays and automated analyses of fluorescent protein reporter expression.At the same time, we are identifying binding sites for transcription factors and chromatin regulatory factors throughout the genome, and recently developed techniques for isolating specific nuclei will enhance our ability for tissue specific chromatin profiling (Deal and Henikoff, 2010).Still, we have only begun to explore other areas of transcriptional regulation.How does higher order organization within nuclei affect gene expression (Meister et al., 2010;Ikegami et al., 2010)?What impact do post-transcriptional and post-translational mechanisms have on transcription factor activity?Overall, it is an exciting time to study transcriptional regulation in C. elegans.Because of the relative simplicity of C. elegans gene promoters, we can reasonably make connections between transcription factors and their target genes.Understanding this information will help decipher how the information in the genome controls every aspect of C. elegans biology.

Figure 1 :
Figure 1: A hypothetical RNAPII elongation megacomplex.RNAPII (including the extended CTD with SerPO4 knobs) is purple; the globular and CTD portions are drawn approximately to scale for mammalian RNAPII.Orange DNA is wrapped around yellow nucleosomal histones; nucleosomes modified by Set2 are shaded darker.The nascent RNA transcript is green.Yeast names are used for PCAPs (e.g., Phatnani and Greenleaf, 2006), not all of which are shown.(CBC) cap-binding complex; (CRF) chromatin remodeling factor; (XF) processing/export factor.(Used with permission Phatnani and Greenleaf, 2006)

Figure 2 :
Figure 2: Metazoan regulatory modules controlling transcription.Shown is a diagram of a typical metazoan gene illustrating the complex interactions among cis-acting modules and trans-acting factors regulating gene expression.Note that both positive and negative control regions are interspersed with promoter modules, all of which can be further influenced by distal regions regulating chromatin configuration, such as insulators.(Used with permission, Levine and Tjian, 2003) . The five elements commonly observed are an Sp1 like site (CNCCGCCC), T-blocks that correlate with nucleosome eviction and gene expression levels (TTTT[N/T]), TATA box (GTATA[TA][TA]AG), trans-splicing site (TTnCAG), and Kozak site that includes the translation initiation codon ([CA]AA[CA]ATG) (Grishkevich et al., 2011) (Figure

Figure 3 :
Figure 3: Core promoter motif composition among Caenorhabditis promoters.Motif composition of the Caenorhabditis core promoter.(A) Five conserved motifs in each of the five examined Caenorhabditis species are shown as sequence logos.(*) In contrast to all other motifs that were found in the initial search, the Caenorhabditis japonica TATA box motif was detected only in sequences whose orthologs contained the "TATA" motif.(B) Distribution of motifs relative to the translational start codon.The gray box in each plot corresponds to the core promoter.The area under the curve is the total frequency of occurrence within the core promoter region with the line indicating the frequency at each position as indicated by the scale to the left.The C. japonica SL1 motif was normalized to the length of the other species.(C) Frequency table for each sequence motif.(Figure and data used with permission, Grishkevich et al., 2011)

Figure 4 :
Figure 4: C. elegans transcription factor genes with clear orthologs in the human genome.The percentage of C. elegans genes from each DNA-binding domain family with clear orthologs in the human genome (Reece-Hoyes et al., 2005; Shaye and Greenwald, 2011).Only DNA-binding domain families with 3 or more C. elegans members are included.

Figure 5 :
Figure 5: Automated analyses of transcription factor gene expression.(A-D) Representative frames showing expression of mCherry reporters for the indicated transcription factors (red) ovelayed on ubiquitous histone::gfp reporter expression marking all nuclei.Anterior is left.(E) Embryonic lineage trees showing expression of the indicated transcription factor::mCherry transgenes (colored), or a merged image showing expression of all four transgenes.This data was acquired and visualized using StarryNite and AceTree (Murray et al., 2008).Panels A-D were acquired from movies at http://epic.gs.washington.edu/.Panel E was adapted with permission from Murray et al., 2008.

Figure 6 :
Figure 6: Examples of ChIP-chip data for RNA polymerase II (Pol II) and five different histone modifications.The X axis represents a stretch of Chr IV from nt 12,341,610 to 12,472,625.Coding genes are shown below, with arrows marking the direction of transcription 5'→3'.The Y axis represents the z-score of the log 2 ratios of IP/Input (mean centered and scaled to stdev=1).Note the opposing pattern of H3K27me, a mark associated with gene silencing, with that of activation marks such as H3K4me and H3K36me.Image courtesy of Susan Strome and Andreas Rechsteiner.

Figure 7 :
Figure 7: Outline of genes in each group of the SynMuv pathway.SynMuv A and SynMuv B are the major groups in this pathway that are redundantly required for vulval development.The relationship of these pathways with the SynMuv C pathway are less clear.Within the SynMuv B group, LIN-53, marked with an asterisk, is listed twice, as it is found in both NuRD and DRM complexes.