Transcriptional regulation *

The regulation of transcription in C. elegans shares many similarities to transcription in other organisms. The details of how specific transcription factors bind to target promoters and act as either activators or repressors are still being examined in many cases, but an increasing number of factors and their binding sites are being characterized. This chapter reviews the general concepts that have emerged with regards to promoter function in C. elegans. Included are the methods that have been successfully employed as well as limitations encountered to date. Specific cis-acting promoter elements from myo-2, hlh-1 and lin-26 are discussed as examples of complex promoters regulated by multiple sequence elements. In addition, examples of organ-, tissue-, and cell type-specific mechanisms for generating spatial specificity in gene expression are discussed.


Introduction
Regulation of Polymerase II (Pol II) transcription in C. elegans can be described as typical for eukaryotes. Pol II appears to act in concert with TATA Binding Protein (TBP) and TBP-Associated Factors (TAFs) at the promoter of protein coding genes (Dantonel et al., 2000;Kaltenbach et al., 2000;Lichtsteiner and Tjian, 1993;Walker et al., 2004). Active Pol II is phosphorylated on the C-terminal domain (CTD) at serine 2 and 5 like other eukaryotes (Seydoux and Dunn, 1997;Wallenfang and Seydoux, 2002;Zhang et al., 2003). The functions of these proteins at the core of transcription are beginning to be defined and are reviewed in Transcription mechanisms. However, there are many things about transcription in C. elegans that we do not yet know for sure. For example, putative TATA and CAT boxes upstream of the coding region are often described, but this is largely done subjectively without any firm experimental evidence for the function of these elements. We also have not fully explored the role of histone modifications and chromatin organization in somatic cell transcription. Progress on these fronts has primarily been made in the areas of dosage compensation (see X-chromosome dosage compensation) and germline chromatin organization (see Germline chromatin). For these cases, the evolutionary conservation suggests that somatic cell transcription will similarly be influenced by typical eucaryotic mechanisms of chromatin organization. In many ways then, our understanding of transcription in C. elegans is still in its infancy, reflecting the fact that C. elegans, as a model biological system, is still a growing field that has primarily been exploited for its genetics.
The purpose of this chapter is to provide an overview of transcriptional regulation in C. elegans. It is geared towards an audience that is naïve in the ways of C. elegans gene regulation, but, it also includes information that should be helpful even to the seasoned veteran. The tools for studying transcription in C. elegans will be described in an effort to illustrate successful approaches and highlight techniques that, while useful in other systems, are challenging in the nematode. A review of the general trends in regulatory elements is followed by specific examples of spatial and temporal regulatory strategies. The hope is that this information will serve as both a useful review and an entry point into literature appropriate for specific applications.

Tools to study transcriptional regulation
Reporter genes are the most commonly used method to study transcriptional regulation in C. elegans. It is straightforward to generate transgenic lines (see Transformation and microinjection), and, as C. elegans is transparent throughout its life, it is easy to visualize reporter gene expression in all cells. Early studies of gene expression relied on lacZ reporter genes and were aided by the development of a set of vectors by the Fire lab (Fire et al., 1990). These lacZ reporters were very useful for determining cis-acting transcriptional control elements (Fire and Waterston, 1989;MacMorris et al., 1994;Okkema et al., 1993) and lacZ continues to be a robust marker that can be pushed to detect even low levels of expression (Wilkinson and Greenwald, 1995).
More recently, Green Fluorescent Protein (GFP), or one of its variants, serves as the common reporter. A GFP coding cassette can be inserted in different locations within a large genomic clone (tens of kilobases) to generate transcriptional and/or translational fusions. These constructs provide the greatest chance of capturing required cis-acting regulatory elements. However, it is more common to make the assumption that genomic sequences 5′ to the coding region represent the core promoter. This region, usually a few kilobases in length, can be PCR amplified and easily cloned into the reporter gene backbone of choice. Alternatively, the Promoterome Project can serve as a source for many promoter regions and is useful if the reporter gene cloning strategy is Gateway (Invitrogen) compatible (Dupuy et al., 2004). These constructs are commercially available through Open Biosytems.
There are several considerations to take into account when making reporter genes. One is the distinction between transcriptional and translational reporters; often one would like to have both. For transcriptional reporters, expression can be engineered to highlight the cytoplasm, nucleus or other cellular compartments in the expressing cell. Nuclear localized reporters are useful for embryonic cell identification whereas cytoplasmic reporters are often more useful for larval cells, particularly in neurons where they highlight the axonal and dendritic tracks. The Chalfie lab is developing a two-part fluorescent expression system that has the potential to simplify cell type identification (Zhang et al., 2004). Translational reporter genes can provide information on the subcellular localization of the endogenous gene product. In this case, it is advisable to use the fusion protein to rescue a mutant phenotype, thus demonstrating that some or all aspects of the expression pattern and subcellular localization are biologically relevant.

Transcriptional regulation
Once a pattern of expression is determined, promoter analyses can be used to home in on important regulatory elements. Sequential deletions of putative promoter regions linked to a GFP reporter gene are easily made by traditional cloning or Splicing by Overlap Extension (SOEing) Polymerase Chain Reaction (PCR) amplification (Horton et al., 1990). The latter technique allows high throughput and convenience as PCR reactions can be injected directly into animals without purification or cloning (Hobert, 2002). As the control elements become localized to small genomic regions (several hundred base pairs or less), they can be placed upstream of "basal" promoters to assay for enhancer activity. These approaches are common in the C. elegans literature and can be very successful in defining important cis-acting regulatory sequences.
The use of reporter genes has several important caveats. First and foremost is that they are artificial and can easily misrepresent the pattern of gene expression. Important positive-and negative-acting control elements can be excluded by assuming promoter location leading to mosaicism, loss of expression, or ectopic expression (Krause et al., 1994). For example, reporter genes that lack sufficient control elements for fidelity are often expressed in the anterior and posterior intestinal cells or a small set of head neurons. Moreover, small changes in promoter regions can dramatically alter expression patterns as illustrated by the studies of the ges-1 gene (Egan et al., 1995). It is critical to confirm reporter gene expression patterns with an independent technique such as in situ hybridization, antibody staining, or mutant phenotype.
A second concern of reporter genes is the nature of any additional modules in the construct or in a co-injected marker that may have an effect on expression. For example, many of the standard constructs available from the Fire Lab have a 3′ untranslated region (UTR) derived from the unc-54 gene encoding muscle myosin heavy chain. These sequences may not be neutral when combined with promoters from other cell types. There are also reports of dramatic effects of co-transformation markers on expression levels suggesting that you should not rely on a single co-transformation marker when exploring the expression of a novel gene (Fukushige and Siddiqui, 1995). Similarly, "basal" promoters (e.g., pes-10) may be biased in working with certain types of genomic elements. This may cause them to fail to respond in certain cell types resulting in false information about a particular enhancer element (Natarajan et al., 2004). As long as you keep these limitations in mind, reporter genes can be very helpful in characterizing transcriptional control elements for the gene of interest.
Genome-wide approaches provide a more global assessment of transcriptional regulation and have begun to become more common in C. elegans. The availability of both spotted cDNA (see Kim Lab) and oligonucleotide microarrays (e.g., Affymetrix) for C. elegans has given birth to a large amount of gene expression data in response to tissue type (e.g., germline-enriched; Reinke et al., 2004), growth conditions (e.g., dauer; Liu et al., 2004), or mutant background (e.g., DAF-16;McElwee et al., 2003). Much of this data is available on web sites (e.g., http://genome-www5.stanford.edu/cgi-bin/login.pl) or is linked in Wormbase to individual genes. A second global approach for gene expression profiling is Serial Analysis of Gene Expression (SAGE) that has recently been combined with tissue isolation or cell type sorting (McKay et al., 2003; http://elegans.bcgsc.ca/home/sage.html). These approaches give an overview of expression. For specific genes, this data should be validated by an independent method, such as reverse transcriptase (RT)-PCR or reporter genes.
Bioinformatics provides another way to study gene regulation, either alone or in combination with other methods. Currently, the genome sequence of two Caenorhabditis species are finished (elegans and briggsae), one is in draft (remanei), and two are planned (japonica and CB5161). Interspecific comparisons of non-coding regions provides a powerful tool in identifying important cis-acting regulatory elements controlling gene expression, as functional elements will remain constrained through evolution. Comparisons between C. elegans and C. briggsae revealed important cis-acting sequences controlling the vitellogenin genes and helped to identify GATA-type transcription factors as likely regulators (MacMorris et al., 1994;Spieth et al., 1991;Winter et al., 1996;Zucker-Aprison and Blumenthal, 1989). Such comparison continue to provide valuable information about cis-acting sequences within gene promoter regions with many examples in the literature (Culetto et al., 1999;Kirouac and Sternberg, 2003;Marshall and McGhee, 2001;Natarajan et al., 2004;Teng et al., 2004). The power of these comparisons is increased as the number of species is increased and will thus become more informative as sequences of additional species are finished. Recently developed programs, such as FamilyJewels, provide methods for sophisticated multiple alignments (Brown et al., 2002). This approach will become widely exploited in coming years to pinpoint regulatory promoter elements.
Bioinformatic analysis of known transcription factor binding sites upstream of coding regions has also been successful. Given a known binding consensus site of sufficient length, the CisOrtho program can be used to ferret out a list of potential genes sharing expression patterns (Bigelow et al., 2004). Bioinformatic comparisons of Transcriptional regulation promoters from genes with the same or overlapping expression patterns can also be informative to home in on potential regulatory elements (for example, see Chang et al., 2004;Guhathakurta et al., 2004). A nice combination of bioinformatics and in vitro studies used the DNA binding properties of DAF-12, a regulator of dauer development and lifespan, to define potential binding sites and gene targets (Shostak et al., 2004). Regardless of the method used, candidate elements and gene targets should be validated experimentally by an independent means.
There are also several techniques for studying gene expression that, while commonplace in other organisms, are not routinely used in the worm. For example, one would ideally isolate pure populations of cells and tease apart transcriptional regulation at a biochemical level. At only 1mm in length as an adult, C. elegans makes tissue dissections tedious or impossible for generating enough homogeneous tissue for biochemical analysis. The recent development of cell culture techniques (see Methods in Cell biology), coupled with cell sorting, may make biochemical approaches more feasible in the future. However, the technique is still challenging enough that most researchers have opted for other methods to study transcription.
In situ hybridization is another common technique for cataloging transcriptional profiles in many organisms but it is less often used in C. elegans studies. The impermeable egg shell of C. elegans embryos and the cuticle of larvae and adults often lead to background hybridization or partially permeabilized animals, making it difficult to get in situ hybridization signals that are reproducible or trustworthy. Despite these difficulties, a genome-scale effort to catalog gene transcription profiles using in situ hybridization by the Kohara group is now underway. Their protocols and data are useful and can be accessed at http://nematode.lab.nig.ac.jp/db2/index.php.

Locating cis-acting regulatory elements
The majority of protein coding genes in C. elegans are within gene-dense regions of the genome. Consequently, cis-acting regulatory regions are usually close to the coding region. The minimal promoter region required for proper expression of most Pol II transcripts lies within a couple of kilobases upstream of the start codon. There are notable exceptions to this compact view of cis-acting sequences. For example, egl-1 expression is controlled, in part, by an element located greater than 2 kb downstream of the coding region and beyond an unrelated, intervening gene (Thellmann et al., 2003). For lin-39, proper reporter gene expression required inclusion of~30 kb of genomic DNA that extended upstream and downstream of the protein coding region (Wigmaister and Eisenmann, personal communication). Clearly C. elegans genes can have complex and distant control regions. However, a rule-of-thumb of 2 kb upstream of the ATG works well as a starting point in the search for cis-acting control elements.
It is important to remember that the minimal promoter region is not synonymous with the natural promoter. The natural promoter may span a much larger region due to redundancy in the function of regulatory elements that ensure proper and robust regulation of the endogenous gene. One common site of additional control elements is within introns. Most C. elegans introns are small (e.g., <100 bp; see Alternative splicing in C. elegans) and are thus unlikely to contain elements controlling expression. However, introns larger than several hundred base pairs do often have such elements (e.g., Nam et al., 2002;Okkema et al., 1993). Therefore, intron size can provide a clue in searching for transcriptional control sequences. Large introns, particularly at the beginning of a coding region, may also provide a clue to promoter organization and the presence of multiple transcriptional initiation sites. For example, nhr-23 has a 1.8 kb intron at the start of the gene that is included in one transcript and absent in a second (Kostrouchova et al., 1998). In cases such as this, the presence of a trans-spliced leader (see Trans-splicing and operons) on two or more different transcripts from a single gene can be an indicator of multiple messages, possibly encoding different protein isoforms.

Simple promoters
A simple promoter is defined here as one in which the cis-acting control elements necessary for proper expression are confined to a small region (a few hundreds of bp) of the genome. Housekeeping genes expressed in all tissues might be good candidates for regulation by simple promoters, Unfortunately, few housekeeping genes in C. elegans have been characterized. Among the best characterized simple promoters are those of the hsp-16 family of genes. This family consists of pairs of divergently transcribed genes with promoter regions sufficient for heat-regulated expression contained within the short (~350 bp) intragenic regions (Jones et al., 1986;Russnak and Candido, 1985;Stringham et al., 1992). Despite these compact promoters, distinct tissue expression patterns are induced from different hsp-16 promoters (Stringham et al., 1992), suggesting the presence of multiple regulatory sites within these simple promoters. Another excellent example of simple promoters are in the vitellogenin (vit) genes, which exhibit stage-, tissue-and sex-specific expression controlled, in the case of vit-2, by a 247 bp promoter Transcriptional regulation (MacMorris et al., 1992;MacMorris et al., 1994). vit-2 promoter activity depends on GATA-factor binding sites and a novel VPE2 site (TGTCAAT) conserved in vit gene promoters in C. elegans and C. briggsae (Spieth et al., 1985;Zucker-Aprison and Blumenthal, 1989). Certain cell cycle promoters have also been shown to be remarkably simple. Analysis of several genes expressed only in proliferative cells and encoding G1 phase regulators (e.g., cyclin D) revealed that proper regulation minimally required a 67 bp region of the promoter (Brodigan et al., 2003;Park and Krause, 1999). How could genes with such dynamic expression profiles throughout development be regulated in an apparently simple way? The answer is likely that they are end effectors of a cell's decision to divide rather than integrating lineage or temporal information governing proliferation.

Complex promoters
The term complex is used here to describe a promoter in which the overall pattern of gene expression is the result of the composite action of several dispersed elements, each influencing or contributing to the overall expression pattern. This piecemeal organization has been described for the promoter region of several genes, including myo-2, hlh-1 and lin-26. These studies reveal examples in which spatial control of transcription is regulated by elements active in groups of cells related by cell-, tissue-and organ-type and by lineage history.

myo-2: activation of a terminal differentiation gene by the combined activities of organ-and cell type-specific regulatory elements
myo-2 encodes a myosin heavy chain expressed exclusively in the pharyngeal muscles as these cells undergo terminal differentiation (Ardizzi and Epstein, 1987;Miller et al., 1983). Characterization of the myo-2 promoter region in transgenic C. elegans and identification of trans-acting regulators indicates expression is regulated by a combination of organ-and cell type-specific signals targeting distinct regulatory sequences.
High level activity of the myo-2 promoter requires a transcriptional enhancer located approximately 300 bp upstream of the transcriptional start (Okkema et al., 1993). The intact myo-2 enhancer is active exclusively in the pharyngeal muscles, but, surprisingly, its activity depends on distinct cell-type-specific and organ-specific subelements, termed B and C, that can separately activate gene expression either specifically in the pharyngeal muscles, or more globally in all pharyngeal cell types (Okkema and Fire, 1994). In their endogenous context within the myo-2 gene, these subelements synergistically activate pharyngeal muscle gene expression.
Consistent with their distinct activities, the B and C subelements are targeted by transcription factors expressed in different spatial patterns in the pharynx ( Figure 1). The cell-type-specific B subelement binds and is activated by the pharyngeal muscle specific NK-2 family homeodomain factor CEH-22 (Okkema and Fire, 1994;Okkema et al., 1997), which is structurally and functionally related to factors controlling cardiac muscle development in other species (Haun et al., 1998). The organ-specific C subelement binds and is activated by the pan-pharyngeal FoxA family transcription factor PHA-4 (Kalb et al., 1998), which is required for formation of pharyngeal muscle and all other pharyngeal cell types during embryonic development (see below).
CEH-22 is not the only factor functioning with PHA-4 to activate myo-2 expression. CEH-22 is expressed in most, but not all, myo-2 expressing pharyngeal muscles (Okkema and Fire, 1994). Likewise a ceh-22 mutant expresses myo-2, although these animals exhibit defects in B subelement activity and pharyngeal muscle development and function (Okkema et al., 1997). Thus, other as yet unidentified factors must contribute to myo-2 expression, and the characterization of these factors will enhance our understanding of pharyngeal muscle development.
Transcriptional regulation Figure 1. CEH-22 and PHA-4 function in combination to activate pharyngeal muscle expression of myo-2. myo-2 expression is activated by the pharyngeal muscle-specific CEH-22 and the pan-pharyngeal PHA-4, which bind the myo-2 enhancer B and C subelements, respectively. Micrographs indicate trangenic embryos expressing ceh-22::gfp in pharyngeal muscles and pha-4::gfp in all pharyngeal cells (top, delimited by arrowheads), and a transgenic adult expressing myo-2::lacZ in the pharyngeal muscles (bottom). Note pha-4::gfp is also expressed in the gut. GFP and β-galactosidase are targeted to nuclei to facilitate cell identification.

hlh-1: activation of gene expression by lineage-preference regulatory elements.
hlh-1 encodes a basic helix-loop-helix transcription factor expressed in all body wall muscle cells and their precursors (Krause et al., 1990). The body wall muscle cells are derived from multiple cell lineages. Of the 81 body wall muscle cells born during embryogenesis, 1 is from the AB lineage, 28 are from the MS lineage, 32 are from the C lineage and 20 are from the D lineage (Sulston et al., 1983). An additional 14 body wall muscle cells (and other cell types) are born postembryonically from the M mesoblast (Sulston and Horvitz, 1977).
Dissection of the hlh-1 promoter shows that gene expression can be properly regulated by multiple elements spanning~3 kb upstream of the ATG (Figure 2; Krause et al., 1994). A core element required for all expression resides just upstream of the ATG. In addition, there are several individual elements that drive expression preferentially in one or more lineages. However, no single element is specific for expression in just one lineage. In addition, the expression during embryogenesis is controlled by a different region than that controlling postembryonic expression. The overall pattern of hlh-1 expression is thus a composite of the action of several lineage-preference elements with overlapping domains of action, working in concert with an essential core element. Superimposed on this spatial pattern of regulation are distinct temporal control elements regulating timing of expression during development. As yet, no trans-acting factors have been identified that bind to the defined cis-acting elements, illustrating the difficulty in using promoter analysis alone to identify trans-acting factors.
Transcriptional regulation Figure 2. Regulation of hlh-1 expression by lineage-preference elements. A) A schematic of body wall muscle nuclei is super-imposed on an image of a comma stage embryo. Each of four different lineages of origin is color-coded as shown in (B) (adapted from (Sulston et al., 1983). C) The promoter and partial coding region (exons 1 and 2) of hlh-1 are shown (adapted from (Krause et al., 1990). All expression is dependent on a "core" element (star) located upstream of the ATG of exon 1. Below the gene structure diagram are color-coded elements that can direct lineage-preference expression of transgenes during embryogenesis; color coding as in (A) and (B). Mature body wall muscle is dependent on distinct temporal elements (purple boxes) that do not have lineage preferences.

lin-26: activation of gene expression by tissue-specific regulatory elements.
lin-26 encodes a predicted zinc-finger transcription factor expressed in a broad range of ectodermally derived epithelial tissues, the somatic gonad and uterus (Labouesse et al., 1996;Labouesse et al., 1994). Within these ectodermally-derived epithelial tissues are the major hypodermis surrounding the body of the animal, specialized hypodermal cells located at the anterior and posterior ends of the body, and interfacial cells such as rectal cells connecting the external epithelium to the endoderm. A recent characterization of the lin-26 promoter region revealed this gene is regulated by a core element required for all expression working in concert with tissue-specific elements, rather than lineage-preference elements as discussed above for hlh-1 (Landmann et al., 2004).
lin-26 is the downstream gene in an alternatively spliced operon including lir-1 (Dufourcq et al., 1999), and proper expression of lin-26 requires an 11 kb upstream region including most of the lir-1 gene itself (den Boer et al., 1998). Within this region are tissue specific regulatory modules that activate gene expression in subsets of lin-26 expressing tissues (Figure 3; Landmann et al., 2004). For example, separable modules control expression in the major hypodermal cells, in the minor hypodermal cells and sheath and socket support cells, in rectal cells, or in the somatic gonad. In some cases, redundant elements contribute to expression in particular tissues (e.g., major hypodermal cells), and, in the case of the minor hypodermis and support cells located at the worms anterior and posterior ends, separable elements active either in anterior or posterior ends were identified. Thus, the lin-26 promoter region contains cis-regulatory elements active in cells that belong to the same organ, are functionally related, or have similar positions along the body (Landmann et al., 2004), and these elements together produce the full lin-26 expression pattern in a piecemeal fashion.  (Landmann et al., 2004): major hypodermal cells include hyp 7 (green), seam cells (orange), and P cells (purple); support cells (red); somatic gonad precursors Z1 and Z4 (yellow). C) The promoter elements for lin-26. All expression is dependent on a "core" element (star) located in the intergenic region between lin-26 and its upstream neighbor lir-1. Tissue-specific control elements, located within a lir-1 intron, are shown below the gene structure diagram with color-coding as in (A) and (B). Most control elements function in cells related by tissue-type but not by lineage. Note also that temporal control is achieved by sequentially acting elements that are progressively further upstream from the ATG of lin-26.

Transcriptional regulation
One common theme to emerge from these three examples is redundancy of regulatory elements. In most cases, even when sub-elements are identified with specific tissue, lineage or organ activity, their loss does not prevent all expression in that region. Clearly endogenous gene regulation has evolved to include multiple and overlapping regulatory regions to ensure proper expression during development. The deconstruction of a promoter is most useful in showing a minimal set of cis-acting control elements. As studies employ more sophisticated techniques and assays, we may learn how extensive this redundancy is.

Trans-acting factors
The completion of the C. elegans genome makes it possible, in theory, to define all transcription factors in the worm. In practice, this effort is more difficult because of several uncertainties when surveying the properties of a given gene. For example, zinc finger motifs can bind DNA but also can serve other functions including RNA binding and protein-protein interactions. It is therefore difficult to conclude that a given gene product is indeed a transcription factor based solely on the presence of signature motifs. For factors that modify chromatin or participate in a transcription complex, the definition of a transcription factor often lies in the eyes of the investigator. A first-pass attempt at defining a list of C. elegans transcription factors is presented in Table 1. Originally compiled in the Sternberg Lab (courtesy of T. Ririe and J. Fernandes), we present a modified version of their list with the understanding that it will necessarily need refinement over time to correct inaccuracies and omissions. The current list includes 664 genes representing only about 3.5% of the predicted genes in C. elegans. This number is surprisingly low and about one half the number of transcription factors estimated previously (McGhee and Krause, 1997).  The goal in studying transcription is to make the link between transcription factors and their target genes. For a small number of genes in C. elegans, this connection has been made and a chart showing some of these is presented in Table 2. Notice that most transcription factors have been defined as either activators or repressors. However, for some, both modes of action have been described highlighting the importance of co-factors and promoter context within chromatin in determining the transcriptional outcome of DNA binding by these proteins. The list of potential target genes for several transcription factors will explode over the coming years with the application of microarray methods. However, most of these will not be specifically tested to determine if the regulation is direct and which cis-acting elements mediate the effect.  Winnier et al., 1999UNC-30 N.D. YES N.D. unc-25, unc-47 TAATCC Eastman et al., 1999Jin et al., 1994 EGL-38 N.D ceh-24, egl-15, mls-1 CATATG;CAGGTG Corsi et al., 2000;Kostas and Fire, 2002 Atonal  TTGTTTAC Furuyama et al., 2000;Lee et al., 2003;McElwee et al., 2003 PHA-4 PEB-1 YES N.D. myo-2 (C element) TGTTTGC Gaudet and Mango, 2002;Kalb et al., 1998;Okkema and Fire, 1994;Vilimas et al., 2004 UNC-1-30

Spatial specificity
Spatial specificity refers to a pattern of gene expression that is limited to one or a few organs, tissues, or cell types. Examples of control mechanisms governing these types of spatial restriction are presented to show the logic underlying these patterns. Our current understanding shows that spatial specificity can be achieved by multiple mechanisms, ranging from the combinatorial action of overlapping transcription factors to transcriptional cascades.

Organ specificity: control of pharyngeal gene expression by a master regulator
The C. elegans pharynx is a complex organ consisting of five very different cell types, including muscles, neurons, epithelia, glands and marginal cells (Albertson and Thomson, 1976). The pharynx initially forms as a primordium of undifferentiated cells around mid-embryogenesis, and these cells subsequently differentiate and express cell type-specific genes (Sulston et al., 1983).
Formation of the pharynx and differentiation of all pharyngeal cell types depends on a single FoxA family transcription factor PHA-4 (Horner et al., 1998;Kalb et al., 1998;Mango et al., 1994). PHA-4 is expressed in all pharyngeal cells beginning at the time these cells become committed to a pharyngeal cell fate, as well as in the hindgut and intestine (Horner et al., 1998;Kalb et al., 1998). PHA-4 is believed to directly regulate most or all genes specifically expressed in the pharynx, including both early genes specifying fate of different pharyngeal cell types and late genes expressed during terminal differentiation (Gaudet and Mango, 2002). A major question in understanding pharyngeal development is how does PHA-4 regulate genes expressed in different pharyngeal cell types and at different times in pharyngeal development.

Transcriptional regulation
The function of PHA-4 in cell-type specific differentiation is best understood in the pharyngeal muscles. As discussed above, PHA-4 functions with the pharyngeal muscle-specific homeodomain factor CEH-22 to activate myo-2 expression during muscle cell differentiation (Kalb et al., 1998;Okkema and Fire, 1994). PHA-4 and CEH-22 similarly target a late functioning auto-regulatory enhancer from ceh-22 itself (Gaudet and Mango, 2002;Kuchenthal et al., 2001), suggesting these factors function together to control many regulatory sequences that function during terminal differentiation of the pharyngeal muscles. Earlier in pharyngeal muscle development, PHA-4 is also required for the initiation of ceh-22 expression (Mango et al., 1994;Vilimas et al., 2004), but the mechanism by which PHA-4 initially activates ceh-22 in the pharyngeal muscles remains unknown.
Less is known of how PHA-4 regulates specific gene expression in other pharyngeal cell types, largely because other pharyngeal specific promoters have not been extensively characterized. This situation, however, appears soon to be changed based on pioneering microarray studies that have identified >300 genes preferentially expressed in the pharynx (Ao et al., 2004;Gaudet and Mango, 2002). Analyses of these genes' known expression patterns, in situ hybridization patterns, and placement on the Gene Expression Topo Map have identified clusters of genes expressed preferentially in subsets of pharyngeal cells, and comparisons of the promoters of genes within these clusters have identified conserved regulatory elements that likely impart positional or cell type specificity to PHA-4 target genes (Ao et al., 2004). PHA-4 also regulates genes in temporally distinct patterns in the pharynx. This temporal regulation may involve both the affinity of PHA-4 for its binding sites in various gene promoters (Gaudet and Mango, 2002), and the presence of binding sites for additional factors functioning with PHA-4 (Gaudet et al., 2004). These mechanisms are likely not mutually exclusive and may be interdependent, as additional factors could affect PHA-4 binding affinity by cooperative binding.

Tissue specificity: regulation of gut gene expression by a cascade of redundant GATA factors.
The E blastomere is the clonal precursor of the gut, and the maternal factors specifying E blastomere identity are well understood. One effect of these maternal factors is to initiate zygotic gene expression, including expression of a series of sequentially functioning GATA family transcription factors expressed exclusively in the gut lineage (reviewed in Maduro and Rothman, 2002; Figure 4). These GATA factors bind WGATAR motifs required in many gut specific genes and directly activate gut gene expression (e.g., Britton et al., 1998;Egan et al., 1995;MacMorris et al., 1992;Nam et al., 2002) Figure 4. Sequentially functioning GATA factors regulate gut gene expression. Genetic pathway indicating genes encoding gut-specific GATA factors and the terminal differentiation gene ges-1, the stage at which their expression begins, and the proposed function of these genes in promoting gut differentiation. An arrow indicates an autoregulatory mechanism that maintains elt-2 expression.
The first of these gut-specific GATA factors is END-1, which is expressed transiently in the E lineage, beginning in the E cell itself and continuing until approximately the 8E stage (Figure 4; Zhu et al., 1997). ELT-2 is then expressed one cell division later, beginning at the 2 E-cell stage . elt-2 expression is activated by END-1 (Zhu et al., 1998), but, unlike end-1, elt-2 remains expressed in the gut throughout the life of the worm through an autoregulatory mechanism Fukushige et al., 1999). Interestingly, both end-1 and elt-2 appear to be members of redundant gene families. While ectopic expression of either of these genes activates widespread gut differentiation, loss-of-function studies reveal surprisingly mild defects in gut gene expression Zhu et al., 1998;Zhu et al., 1997). Indeed, end-1 loss-of-function produces no phenotype. In comparison, elt-2 loss produces gut defects and lethality, while the effect on gene expression varies Transcriptional regulation from promoter to promoter (Fukushige et al., 2005;Fukushige et al., 1998;Oskouian et al., 2005). In both cases, the likely suspects for redundant genes encode additional GATA factors. end-1 may be redundant with the linked gene end-3, while elt-2 may be partially redundant with elt-7 (Maduro and Rothman, 2002). Thus, end-1 and end-3 are believed to establish endoderm fate in the E lineage, while elt-2 and elt-7 are likely the direct regulators of most genes expressed in the gut (Figure 4).
While most gut-specific promoters contain WGATAR motifs, their accurate regulation depends on more than simply turning on ELT-2. Gut genes are expressed under distinct temporal, sex-specific, and environmental controls, indicating other factors must contribute to gut gene regulation. In the case of the vit-2 gene, which encodes a yolk protein expressed only in the gut of adult hermaphrodites, repression in males requires the MAB-3 DM-domain protein (Yi and Zarkower, 1999). Likewise, ges-1, which encodes a gut-specific esterase, is activated in the gut by ELT-2 while being repressed in other regions of the digestive system by an unknown factor binding near the WGATAR motifs (Fukushige et al., 1996;Marshall and McGhee, 2001). In most cases, the identity of factors functioning with ELT-2 remain unknown, and there remains much to be learned about gut transcription.

Cell type specificity: regulation of AIY neuronal expression by a single core motif
Cell type specificity of gene expression is best exemplified by studies of neuronal gene expression (for example Chang et al., 2004;Zhang et al., 2004). Hobert and colleagues have studied the mechanism that regulates gene expression in a single pair of bilateral interneurons in the head called AIY left and right (AIYL & AIYR) that function in sensory input processing, learning, and memory (Ishihara et al., 2002;Mori and Ohshima, 1995;Tsalik et al., 2003). Differentiation of these interneurons is dependent on the transcription factors ceh-10 (Paired homeobox) and ttx-3 (LIM homeobox; Altun-Gultekin et al., 2001). Analysis of several promoters of genes expressed in AIY, including ceh-10 and ttx-3, revealed a consensus 16 bp AIY motif responsible for proper expression and comprising the core of an element that functions as an AIY-specific enhancer (Wenick and Hobert, 2004). This enhancer element is active in combination with some non-neuronal cell type promoters but not others demonstrating that promoter context is an important aspect of transcriptional regulation. Both ceh-10 and ttx-3 are part of an autoregulatory loop that activates their own expression, explaining the presence of an AIY motif within each of their promoters.
Control of AIY gene expression by ceh-10 and ttx-3 provides some insight into the logic of cell type-specific gene regulation (Wenick and Hobert, 2004). Cell type specificity is generated by using a combination of transcription factors that are unique to AIY interneurons in concert with a modular AIY response element upstream of target genes. Although some of the identified ceh-10/ttx-3 target genes were AIY specific, others were generally expressed in neurons or non-neuronal tissues. However, in most cases, expression in AIY was dependent on the AIY motif demonstrating that widespread expression may often be the composite action of several cell type-specific cis-acting elements. Finally, cis-acting control of gene targets encoding terminal differentiation products in AIY appeared to lack repressive elements. This suggests that the integration of positive and negative signals influencing cell type specificity is carried out by upstream transcription factors, ceh-10 and ttx-3 in this case. Once these upstream factors are activated, the downstream target gene battery will ensue largely independent of other influences.

Future
There is little doubt that the field of transcriptional regulation is on the verge of an information explosion. The combination of genome sequences from multiple Caenorhabditis species, microarray transcriptional profiling, and improved methodology will soon lead to a wealth of information on transcriptional activators and downstream target genes. One challenge will be the experimental verification of the mountains of data that will become available about upstream activators and downstream targets. Can these relationships be confirmed by independent approaches and are the interactions direct or indirect? We are entering an age in which the connections between most trans-acting factors and cis-acting regulatory target elements will be defined. Understanding how these connections regulate development will add an exciting chapter in the study of the worm and for transcriptional regulation in general.

Acknowledgements
This research was supported in part by the Intramural Research Program of the NIH, National Institute of Diabetes, Digestive and Kidney Diseases, and by grants from the NIH (GM053996) and the American Heart Association (03505487).