Pre-mRNA splicing and its regulation

Alternative splicing is a common mechanism for the generation of multiple isoforms of proteins. It can function to expand the proteome of an organism and can serve as a way to turn off gene expression after transcription. This review focuses on splicing, its regulation and the progress in this field achieved through studies in C. elegans. Recent experiments, including RNA-Seq to uncover and measure the extent of alternative splicing, comparative genomics to identify splicing regulatory elements, and the development of elegant genetic screens using fluorescent reporter constructs, have increased our understanding of the cis-acting sequences that regulate alternative splicing and the transacting protein factors that bind to these sequences. The topics covered in this review include constitutive splicing factors, identification of alternatively spliced genes, alternative splicing regulation and the coupling of alternative splicing to nonsense-mediated decay. The significant progress towards uncovering the alternative splicing code in this organism is discussed.


The importance of alternative splicing
In multicellular eukaryotes, many genes show evidence for alternative splicing; introns are removed and exons are spliced together in different combinations to yield different multiple distinct messenger RNAs from a single gene.In humans, recent studies suggest that ~95% of genes are capable of undergoing alternative splicing (Pan et al., 2008;Wang et al., 2008).Alternative splicing can be highly regulated under developmental-, tissue-, and signal-specific controls leading to the regulated generation of multiple protein-coding mRNAs from a single gene (Calarco et al., 2011;Hartmann and Valcarcel, 2009;Licatalosi and Darnell, 2010).This extensive use of alternative splicing can lead to a more highly diverse proteome (Nilsen and Graveley, 2010).In addition, alternative splicing can lead to a downstream change in open reading frame in the mRNA and thus produce a premature termination codon (PTC) in one of the isoforms.mRNAs containing PTCs are degraded by the nonsense-mediated decay (NMD) machinery.This coupling of alternative splicing and nonsense-mediated decay (AS-NMD) is a post-transcriptional mechanism for shutting off gene expression that occurs for roughly 1/3 of human alternative splicing events (Lewis et al., 2003).Alternative splicing occurs for ~25% of genes in C. elegans (Ramani et al., 2011).C. elegans has proven to be a useful model system for identifying targets for alternative splicing, the factors that regulate the process and improving our understanding of the mechanisms of this regulation.Like vertebrates, C. elegans is an intron-rich organism.Using the Table Browser feature of the UCSC Genome Browser (Karolchik et al., 2011) to examine RefSeq genes (Pruitt et al., 2007), one finds that C. elegans transcripts are assembled from an average of 6.4 exons while human transcripts are assembled from an average of 9.7 exons.Average exon sizes are similar in length between worms and humans, with a larger mean exon size of 218 nt for worm versus 145 nt for human (Lander et al., 2001).However, intron size in C. elegans is much smaller than in humans, with a peak of intron size clustered near the minimal intron length of 47 nt (56% of C. elegans introns are under 100 nt in length), while human introns have a median intron length of 1,023 nt and an average of 3,365 nt (Lander et al., 2001).This review will examine the work that has been done in C. elegans to identify and understand the components of the splicing machinery, the extent of alternative splicing in this organism, the sequences in the genome that regulate this splicing, and the factors that regulate specific alternative splicing events.

Splicing of messenger RNA precursors
The machinery that performs pre-mRNA splicing, the removal of introns and the ligation of exons, is called the spliceosome.The spliceosome assembles onto the nascent pre-mRNA as it is being transcribed.The two trans-esterification reaction steps of pre-mRNA splicing occur in this large multi-component ribonucleoprotein complex (Moore and Proudfoot, 2009;Wahl et al., 2009).The spliceosome assembles onto the pre-mRNA via an ordered binding of its subunits, the uridine-rich small nuclear ribonucleoprotein complexes (U snRNPs).Over 100 different proteins are found in the spliceosome, as part of the snRNP complexes or as associated factors that bind to the pre-mRNA (Jurica and Moore, 2003;Valadkhan and Jaladat, 2010).snRNP complex assembly is guided by RNA-RNA interactions between the U snRNAs and the pre-mRNA and the snRNAs with each other.First U1 binds to the pre-mRNA guided by base-pairing interactions between its 5′ end and the 5′ splice site.U2 binding is guided by base pairing interactions with the branchpoint sequence in the intron.Then the U4/U6•U5 tri-snRNP is recruited to the forming spliceosome.Extensive rearrangements of RNA-RNA and RNA-protein interactions then occur in the spliceosome to generate a catalytically active form containing U2, U5 and U6 snRNP (Ares and Weiser, 1994;Madhani and Guthrie, 1994).Through genetic experiments in yeast, eight ATP-dependent DExD/H box helicases have been identified that are required for the assembly of the spliceosome, for the complex RNA-RNA and RNA-protein interaction rearrangements before, between and after the first and second trans-esterification reactions, and for the disassembly of the spliceosome (Rocak and Linder, 2004).
Several classes of splicing factor proteins have been identified as being essential for metazoan spliceosome assembly (Blencowe, 2000;Kramer, 1996).The SR protein family of splicing factors is required for the earliest interactions of U1 snRNP with the pre-mRNA and for subsequent spliceosome assembly steps (Sanford et al., 2005).SR proteins have redundant functions in promoting constitutive splicing, but certain family members have distinct roles in promoting specific alternative splice site usage.SR protein family members have distinct binding specificities for sequences within exons that serve as splicing enhancer elements (Cartegni and Krainer, 2002).Analysis of the RNA sequences that bind directly to the human SR protein SF2/ASF (SRSF1) using the technique of cross-linking immunoprecipitation (CLIP) have identified preferred exonic sequences for binding to pre-mRNA and a role for shuttling the mRNA out to the cytoplasm for translation (Sanford et al., 2008;Sanford et al., 2009).U2 auxiliary factor (U2AF) is a heterodimeric splicing factor composed of 35 and 65 kD subunits in humans (Kielkopf et al., 2004;Zamore et al., 1992).It binds to the polypyrimidine tract and recognizes the AG dinucleotide at the 3′ end of the intron to promote U2 snRNA interactions with the intron branchpoint sequence (Merendino et al., 1999; Pre-mRNA splicing and its regulation in Caenorhabditis elegans Wu et al., 1999;Zorio and Blumenthal, 1999).U2AF, SR proteins and U1 snRNP assemble onto the pre-mRNA in an ATP-independent manner to form the early (E) complex (Michaud and Reed, 1991;Staknis and Reed, 1994).It is important to note that assembly of the E complex essentially identifies both the 5′ and 3′ splice site for further spliceosome assembly, and thus control of this assembly can be a way of regulating alternative splice site choice (Michaud and Reed, 1993).Splicing factors interact with the pre-mRNA, with other splicing factors and with protein components of the snRNPs (Burge et al., 1999;Caceres and Krainer, 1997;Kramer, 1996).This cooperative assembly increases the local concentration of the snRNAs in the vicinity of splice sites and therefore is likely to promote the RNA-RNA interactions required for spliceosome assembly.

Constitutive splicing factors in C. elegans
The spliceosomal snRNAs of C. elegans are very similar to their yeast and mammalian counterparts (Thomas et al., 1990).In C. elegans, the 5′ end of U1 snRNA is fully complementary to a 5′ splice site sequence of AG|guaaguu (where upper case letters indicate the 3′ end of the exon and lower case letters are the 5′ end of the intron).While only a very small fraction of 5′ splice sites are a completely complementary match to the 5′ end of U1 snRNA, this sequence represents a consensus 5′ splice site.In an alignment of over 28,000 C. elegans introns that begin with GU, it was found that for each position in the 5′ splice site region, at least 51% of the introns showed a match to this consensus (Kent and Zahler, 2000).The importance of U1 snRNA base-pairing with the 5′ splice site is demonstrated by two extragenic allele-specific suppressors of mutations of the canonical G that begins C. elegans introns.sup-6 and sup-39 encode two of the 12 U1 snRNA genes in C. elegans and were identified in screens as suppressors that can allow mutated splice donors to be recognized by the splicing machinery (Roller et al., 2000;Run et al., 1996).These suppressor alleles are compensatory mutations in the 5′ end of U1 snRNA genes that allow them to base pair with the mutated 5′ splice sites (Zahler et al., 2004).These U1 snRNA suppressors are informational suppressors, allele-specific gene-nonspecific suppressors, and join the amber suppressors and the smg genes as members of this group (Mount and Anderson, 2000).A forward genetic screen to identify additional allele-specific suppressors of a 5′ splice site mutation identified three non-snRNA suppressors, smu-1, smu-2 and snrp-27, all protein coding genes (Dassah et al., 2009).SMU-1 is the nematode homolog of mammalian protein fSAP57, and SMU-2 is the nematode homolog of mammalian protein RED.The mammalian proteins have been found to be associated with the mammalian spliceosomal C complex (the catalytically active form) (Jurica and Moore, 2003).SMU-1 and SMU-2 were previously identified as playing a role in alternative splicing of unc-52 (Spartz et al., 2004).SNRP-27 is the nematode homolog of the mammalian tri-snRNP 27K protein that is associated with the U4/U6•U5 tri-snRNP (Fetzer et al., 1997) and the allele identified in this genetic screen is a dominant one (Dassah et al., 2009).Given that SMU-2, SMU-1 and SNRP-27 are homologs of proteins that are involved in later assembled stages of the spliceosome in other studies, their role as suppressors of cryptic splice site usage suggests that these factors contribute to the fidelity of splice site choice at later assembly stages, after the initial identification of 5′ splice sites by U1 snRNP.
In C elegans, the snRNP-associated proteins U1A and U2B″ are approximately 50% identical to each other, but are somewhat different from their mammalian counterparts or the single Drosophila homolog, SNF (Saldi et al., 2007).Saldi et al. showed that the genes that encode these proteins (rnp-2 and rnp-3) are co-transcribed in an operon, and that RNP-2 is U1 snRNP-associated (U1A) whereas RNP-3 is U2 snRNP-associated (U2B″).In addition to their co-regulation, they also showed that RNP-2 and RNP-3 are functionally redundant.Both genes must be knocked out before a phenotype is observed, indicating that either can interact with both snRNPs when the other is absent.
C. elegans contains eight different members of the SR protein family called the rsp genes (Kawano et al., 2000;Longman et al., 2000;Longman et al., 2001;Morrison et al., 1997).Family members are expressed in all cells at all stages of development (Kawano et al., 2000).There is functional overlap among the family members as studies suggest that, with the exception of rsp-3, multiple SR proteins need to be targeted by RNA interference in order for a phenotype to become evident (Kawano et al., 2000;Longman et al., 2000;Longman et al., 2001).Several viable mutant alleles of SR proteins have been generated by the C. elegans Gene Knockout Consortium.These include rsp-2(ok639) (human homolog is SRSF5/SRp40), rsp-5(ok324) (human homolog is SRSF2/SC35) and rsp-6(tm367) (human homolog is SRSF7/9G8).All three deletion alleles are viable with no obvious deleterious phenotypes.Using alternative splicing-sensitive DNA microarrays, Barberan-Soler et al. were able to detect >2-fold changes in alternative splicing isoform ratios for at least 10 target genes each, suggesting that SR proteins in C. elegans do play a role in alternative splice site determination (Barberan-Soler et al., 2011).The phenomenon of trans splicing is common in nematode species (Allen et al., 2011;Blumenthal, 2005;Bruzik, 1996;Lasda and Blumenthal, 2011).Using splicing competent embryonic extracts from the parasitic nematode Ascaris lumbricoides, Sanford and Bruzik Pre-mRNA splicing and its regulation in Caenorhabditis elegans showed that SR proteins are essential for the trans splicing reaction and that the phosphorylation state and activity level of these proteins changes during early development (Sanford and Bruzik, 1999;Sanford and Bruzik, 1999).
C. elegans has a homolog of each U2AF subunit, uaf-1 and uaf-2 (Zorio and Blumenthal, 1999;Zorio et al., 1997).C. elegans has a short, defined polypyrimidine tract at the 3′ end of its introns (Blumenthal and Steward, 1997).Studies in this organism demonstrated the specificity with which the U2AF heterodimer subunits recognize the polypyrimidine tract and 3′ splice site at the end of the intron.The U2AF65 homolog, UAF-1, interacts with the short polypyrimidine tract while the U2AF35 homolog UAF-2 recognizes the AG dinucleotide at the 3′ end of the intron (Hollins et al., 2005;Zorio and Blumenthal, 1999).These studies clearly demonstrated that, even though the AG at the 3′ end of the intron is involved in the second trans-esterification reaction of splicing, it also has an important role in early spliceosome assembly.UAF-1 regulates its own expression.The uaf-1 message is alternatively spliced to contain an unusual exon with an in-frame stop codon.This exon is unusual because it contains 10 matches to the C. elegans 3′ splice acceptor sequence UUUUCAG/R, the sequence recognized directly by C. elegans U2AF (Zorio and Blumenthal, 1999).MacMorris et al. showed that these splice acceptor repeats cause the pre-mRNA to be retained in the nucleus and thus not become a substrate for translation or nonsense-mediated decay.Experiments in which the UUUUCAG/R repeats were placed in a GFP transgene indicate that UAF-1 is responsible for the nuclear retention of its message, suggesting a model in which an autoregulatory feedback loop controls U2AF homeostasis (MacMorris et al., 1999).Genetic suppressor screens have identified viable mutations in uaf-1 that can activate changes in splicing of cryptic 3′ splice sites in a target gene (Ma and Horvitz, 2009).In C. elegans, there is little available information about the identity of the branchpoint sequence at which lariat formation occurs during the first catalytic step of splicing, and no strong branchpoint consensus sequence is found in C. elegans introns.Branchpoint formation in other systems has been shown to require splicing factor 1/branchpoint binding protein (SF1/BBP) which binds to both U2AF and the branchpoint sequence (Liu et al., 2001;Selenko et al., 2003).The C. elegans homolog of SF1/BBP, SFA-1, is essential for development and has strong binding affinity with human U2AF65 protein in vitro (Mazroui et al., 1999).SFA-1 binding to UAF-1, which in turn binds strongly to the 3′ splice acceptor sequence, has been proposed as a key step in identifying the weak branchpoint for splicing (Hollins et al., 2005).Viable mutations in sfa-1 were identified in the same screen for suppressors of the unc-93(e1500) missense mutation in which uaf-1 suppressors were isolated; this is consistent with uaf-1 and sfa-1 working together in 3′ splice site recognition, although a role for the sfa-1 suppressors in altering splice site usage has yet to be demonstrated (Ma and Horvitz, 2009).
C. elegans has been used as an important model system to study the transition from mitotic proliferation to entry into meiosis in the germ line (Ellis and Schedl, 2007;Kimble and Crittenden, 2005).Screens for genes that masculinize the germline of hermaphrodites (loss of the ability to stop the limited sperm meiosis) led to identification of the mog (masculinization of germline) genes (Graham and Kimble, 1993;Graham et al., 1993).Four of the five mog genes that have been cloned so far are all homologs of constitutive splicing factors.MOG-1, MOG-4 and MOG-5 are all homologs of the DEXD/H helicases PRP16, PRP2 and PRP22 respectively (Puoti and Kimble, 1999;Puoti and Kimble, 2000).MOG-2 is a homolog of the U2 snRNP protein U2A′ (Zanetti et al., 2011).In oocyte production, GLP-1/Notch activation in pre-meiotic germ cells, in response to signaling from the distal tip cell of the gonad arm, controls the mitotic/meiotic proliferation/differentiation of the germ cells through repression of the redundant GLD-1 and GLD-2 pathways (Kimble and Crittenden, 2005).A screen for enhancers of a weak gain of function allele of glp-1 identified the worm homolog of the yeast PRP17 splicing factor, prp-17 (Kerins et al., 2010).Given that both prp-17 and the mog genes appear to be constitutive splicing factors, Kerins and colleagues performed an RNAi screen against known splicing factors and looked for phenotypes of extensive proliferation in the germline.47/114 factors tested by RNAi showed over-proliferation and/or Mog sex determination phenotypes (Kerins et al., 2010).Table 2 and supplemental table S1 of the Kerins et al. 2010 () paper are an excellent resource as catalogs of the C. elegans homologs of known yeast and mammalian spliceosome components.So why are mutations in the splicing machinery specifically associated with the proliferation/meiosis switch and with sex determination?Kerins et al. propose three models to explain these observations (Kerins et al., 2010): 1) that one or more genes required for controlling meiosis are spliced inefficiently, and so mutations that make splicing less efficient are more likely to show these phenotypes; 2) that one or more genes controlling meiosis are regulated through alternative splicing, and that defects that make splicing less efficient change alternative splicing; 3) that pre-mRNA splicing may be necessary for assembly of ribonucleoprotein complexes on mRNAs in the nucleus, such as the exon-junction complex, that may be important for RNA stability and translational regulation of target genes in the cytoplasm (Kerins et al., 2010).The finding that key developmental processes require efficient splicing opens up important opportunities for further research in both developmental biology and splicing regulation.
Pre-mRNA splicing and its regulation in Caenorhabditis elegans

Alternative pre-mRNA splicing
Alternative splicing is a mechanism for generating multiple mRNA transcripts from a single gene (Kalsotra and Cooper, 2011).Types of alternative splicing include the use of alternative 5′ splice sites, alternative 3′ splice sites, cassette exons, retained introns and mutually exclusive exons.These are represented graphically in Figure 1.Alternative splicing often leads to changes in the primary amino acid sequence of the protein, sometimes subtle and sometimes quite dramatic.Alternative splicing can control where the site of 3′ end cleavage and polyadenlyation occurs, indicating a strong interaction between the splicing and 3′ end formation machineries (Proudfoot, 2011).Many of the genes with alternative isoforms are generated through the use of alternative promoters.Even though these alternative promoters will lead to alternative 5′ exons spliced to common downstream exons, the generation of the unique first exon is not regulated at the level of alternative splicing but instead by the decisions that govern transcriptional initiation.Alternative splicing may also produce isoforms that use different stop codons.An isoform containing a premature termination codon (PTC) is a candidate for degradation by the nonsense-mediated decay (NMD) pathway.Evidence for the coupling of alternative splicing and NMD was first demonstrated in C. elegans in a forward genetic screen for regulators of alternative splicing.Morrison et al. showed that two of the genes encoding SR protein splicing factors have alternative isoforms with premature termination codons.These alternatively spliced forms were preferentially stabilized in suppressor with morphogenetic effects on genitalia (smg) mutants leading to a change in the steady-state ratio of the alternative isoforms (Morrison et al., 1997).This was the first demonstration that natural substrates for the NMD pathway are alternative isoforms of transcripts.Previously, mutant alleles of genes that introduce a PTC were used to identify mutations in smg genes as suppressors (Hodgkin et al., 1989), and a role in suppression of aberrant transcripts caused by mutation had been proposed as the function of the NMD pathway (Pulak and Anderson, 1993).Extensive analysis of vertebrate alternative splicing indicates that one third of alternative splicing events leads to production of an isoform that is a substrate for the NMD pathway (Lewis et al., 2003).To measure the ratio of alternative mRNA isoforms for a particular gene, scientists determine the relative steady-state levels of the isoforms in RNA extracted from cells, tissues or animals by reverse transcription-PCR, quantitative PCR, splicing-sensitive DNA microarrays or RNA-seq.When measuring alternatively spliced isoforms it is important to remember that the steady-state ratio of the isoforms represents a combination of two variables; the relative amounts of the isoforms generated co-transcriptionally by the splicing machinery in the nucleus and the relative stability of those mRNA isoforms in the nucleus and cytoplasm.

Identification of alternatively spliced genes in C. elegans
Initial identification of alternatively spliced genes in C. elegans was dependent on comparison of Expressed Sequence Tags (ESTs; libraries of randomly sequenced cDNAs) to the genome sequence (Kent and Zahler, 2000).From this initial identification of 844 alternatively spliced genes there has been steady growth in this number through constant annotation of genes based on experimental results, additional EST evidence as well as SAGE data analysis (Ruzanov et al., 2007).New data for identifying alternatively spliced genes comes from the new high-throughput RNA sequencing technologies.Ramani et al. have reported using RNA-seq to identify alternatively spliced genes in C. elegans at various developmental time points (Ramani et al., 2011).In addition to using this approach to confirm the existence of 71% of the previously annotated alternative splicing events in C. elegans, they report sequence evidence for 2183 new alternatively splicing events, with 759 of these new splice junctions detected in 5 or more independent sequencing reads.Their alternative splicing predictions and evidence for developmental regulation are accessible in a genome browser format at http://splicebrowse.ccbr.utoronto.ca/.As of Wormbase release WS226, there are 4732 alternative mRNA isoforms annotated for 20,439 protein-coding genes.Ramani and colleagues performed an extrapolation from their data based on 200,000,000 sequence reads to suggest that there are ~5000 genes that undergo alternative splicing in C. elegans, representing ~25% of protein coding genes (Ramani et al., 2011).While this number is less than the percentage of genes predicted to undergo alternative splicing in mammals, it is strong indication that alternative splicing is a major gene regulatory process in nematodes.Consistent with the model that alternative splicing is a critical process in nematodes, several studies have shown that worm alternative splicing and its regulation are under strong evolutionary selective pressure (Barberan-Soler and Zahler, 2008;Irimia et al., 2008;Li et al., 2010;Rukov et al., 2007).
C. elegans alternatively spliced genes are capable of undergoing developmental regulation.let-2, an alpha(2) type IV collagen gene has two mutually exclusive exons; this was the first gene to be shown to undergo a dramatic switch in alternative isoform usage to generate embryo-and adult-specific versions of the protein (Sibley et al., 1993).Using custom DNA microarrays designed to detect alternative splicing, Barberan-Soler uncovered 62 genes whose alternative splicing showed >4-fold changes in isoform ratios during development (Barberan-Soler and Zahler, 2008).Recently RNA-Seq combined with microarray approaches uncovered 574 temporally-regulated alternative splicing events (Ramani et al., 2011).Examples of sex-specific alternative splicing in nematodes have been uncovered.One isoform of the nuclear hormone receptor unc-55 is only produced in males (Shan and Walthall, 2008).In hermaphrodites, the feminizing locus on X (fox-1) gene encodes an RNA binding protein that is a regulator of sex determination and acts as a numerator for counting X chromosomes (Hodgkin et al., 1994).fox-1 acts post-transcriptionally to inhibit the expression of the xol-1 gene, the major specifier of male fate (Nicoll et al., 1997).The mammalian homologs of FOX-1 were subsequently shown to be key regulators of tissue-specific alternative splicing (Jin et al., 2003;Underwood et al., 2005).

Regulation of alternative splicing
The spliceosome assembles through an ordered binding of its snRNP subunits onto the pre-mRNA (Burge et al., 1999).Alternative splicing regulation essentially controls where the assembly of a functional spliceosome occurs.Alternatively spliced exons often have weak consensus sequences at the 5′ and 3′ ends of the introns, suggesting that additional signals are required for recognition of the exon by the splicing machinery (Lopez, 1998).Twenty five years of genomic, genetic, and biochemical experiments have led to the understanding that cis splicing regulatory elements on the pre-mRNA are responsible for controlling alternative splicing (Hallegger et al., 2010).These regulatory regions are found in alternative exons or in flanking introns and can be enhancers or silencers of splice site usage (Figure 2).These sequence motifs serve as binding sites for protein factors that can enhance or inhibit the ability of the spliceosome to recognize the exons.The exonic elements not only encode amino acids but also regulate their own ability to be spliced into the mature message.In the "splicing code" hypothesis, the combinatorial assembly of different splicing factors onto multiple cis elements controls where a functional spliceosome assembles (Wang and Burge, 2008).Since the same pre-mRNA sequence is transcribed in different cell types and the constitutive splicing machinery is present in all cells, the regulation of cell type-specific splicing is dependent on the presence and the activity of trans-acting protein factors that bind the nascent transcript in a given cell.Catalogs of cis splicing elements, as well as models for their potential role in affecting alternative splicing in specific cell types based on their position relative to the alternative splicing event, have increased in recent years to the point where most major players have likely been identified (Calarco et al., 2011).A recent major advance in computational prediction of mouse tissue-specific alternative splicing has dramatically improved our understanding of the role of splicing elements (Barash et al., 2010).Despite improvements in probabilistic models for decoding alternative splicing, mechanistic studies are still required to understand how the trans-acting factors binding to these cis elements act in a combinatorial manner to regulate spliceosome assembly.
Pre-mRNA splicing and its regulation in Caenorhabditis elegans Figure 2. Elements of the pre-mRNA through which alternative splicing is regulated.Some combination of these regulatory regions can usually be found.Weaker consensus splice sites surrounding the alternative exon, exonic regulatory regions and intronic regulatory regions are indicated.
Although the catalogs of cis elements and trans-acting factors are increasing, the mechanisms by which these control spliceosome assembly have been difficult to resolve.For example, splicing factors Fox1, hnRNP L and PTB act through dramatically different mechanisms to silence splicing of target exons.FOX1 binding to UGCAUG sequences blocks formation of the spliceosomal E complex at two distinct steps (Zhou and Lou, 2008).HnRNP L blocks tri-snRNP recruitment through interactions with adjacent assembled U1 and U2 snRNPs (House and Lynch, 2006).Interactions between PTB proteins bound to both introns flanking an alternative cassette exon are proposed to loop out the exon and prevent its inclusion in the mature message (Li et al., 2007;Wagner and Garcia-Blanco, 2001).Mechanistic studies must take into account the fact that multiple regulatory proteins assemble onto splicing control sequences.One example of the challenges in deciphering alternative splicing regulation is in the regulation of the neural-specific N1 exon of human c-src.The downstream control sequence (DCS), found in the intron downstream of the N1 exon, binds multiple factors to regulate N1 splicing; these include hnRNP H, hnRNP F, KH-type splicing regulatory protein (KSRP), Fox1 and a neural-specific homolog of PTB (nPTB) (Chou et al., 1999;Chou et al., 2000;Markovtsov et al., 2000;Min et al., 1995;Min et al., 1997;Underwood et al., 2005).Comparisons of the factors that assemble onto this element from neuronal cell nuclear extract vs. epithelial cell nuclear extract indicate that a subset of these proteins binds from each extract.Factors in the neuronal extract promote assembly of a different complex that is required to activate splicing of the neural-specific exon.Given that only a small number of alternative splicing substrates have been analyzed in detail, much more work is required to understand how binding of splicing factors to an intron or exon regulates splicing.

Identifying the elements of the C. elegans splicing code
With C. elegans as the first completely sequenced animal genome (C.elegans Sequencing Consortium, 1998), and with comparative genomic information available, initially from C. briggsae (Stein et al., 2003) but now from many related nematodes, this model system offers advantages to the identification of alternative splicing regulatory elements.After 100 million years of evolutionary divergence, C. elegans and C. briggsae exons have maintained 80% sequence identity, while the introns flanking these exons have diverged rapidly such that it is rare to find evolutionarily conserved sequences within them (Kent and Zahler, 2000).However, in many introns flanking alternatively spliced exons there are defined patches of evolutionary conservation.Figure 3 shows five examples of alternative splicing showing regions of evolutionary conservation in introns flanking alternative exons.It was hypothesized that these distinct evolutionarily conserved intronic sequences had the potential to define key splicing regulatory elements (Kent and Zahler, 2000).An analysis of the frequency of pentamer and hexamer sequence occurrence in these highly conserved intronic elements compared to their frequency in a total intron database was used to identify hexamer and pentamer words associated with alternative splicing in C. elegans (Kabat et al., 2006).Pentamers and hexamers were chosen for these studies because these represent the known length of preferred RNA binding sequences for many RNA binding proteins involved in splicing regulation.Many of these conserved intronic pentamers/hexamers are a direct match to human alternative splicing regulatory elements, such as GCAUG for Fox1 (Jin et al., 2003), CUCUCU for PTB (Oberstrass et al., 2005) and GUGUGU for ETR-3 (Faustino and Cooper, 2005), indicating the conserved nature of alternative splicing regulation in animals.In addition to identifying lists of regulatory elements, distinct biases were observed as to whether these elements were likely to appear in the intron upstream or the intron downstream of the alternative cassette exon.One of the high scoring words in C. elegans that Pre-mRNA splicing and its regulation in Caenorhabditis elegans has not been identified as a splicing regulatory element in mammals is UCUAUC.This element was demonstrated to regulate alternative splicing in combination with a flanking GCAUG FOX-1 binding site (Kabat et al., 2006).RNA affinity chromatography techniques were employed to identify proteins in C. elegans that bind to this new splicing regulatory element.C. elegans cellular extracts were passed over columns containing immobilized UCUAUC RNA sequences.The C. elegans homolog of the human hnRNP Q protein, HRP-2, was identified as a high-affinity binder for this sequence and the protein was shown to regulate alternative splicing through binding to UCUAUC in introns (Kabat et al., 2009).Kabat et al., 2006(Kabat et al., 2006).
An alternative approach to discovering new alternative splicing events and their cis regulatory elements was done computationally using a support vector machine trained to identify predictive features of C. elegans alternative cassette exons (Ratsch et al., 2005).One of the features that allowed the support vector machine to predict alternative splicing was the presence of several distinct hexamers in the introns surrounding alternative exons.In a recent approach, RNA-Seq and microarrays were used to identify developmentally-regulated alternative splicing events (Ramani et al., 2011).Their analysis of the 50 intronic nucleotides on either side of the alternative cassette exons revealed enriched pentamer motifs.There is an obvious strong degree of overlap in the high scoring pentamers and hexamers identified in all three studies (Kabat et al., 2006;Ramani et al., 2011;Ratsch et al., 2005).It is clear that progress has been made in the past several years in identifying the most common intronic cis regulators of alternative splicing in C. elegans.Table 1 shows several of the top scoring splicing regulatory hexamers and pentamers from these studies and the trans-acting proteins predicted to bind to them.The next challenge to understanding the splicing code is in deciphering how the binding of proteins to these elements regulates alternative splicing.
Table 1.Characterized intronic components of the C. elegans alternative splicing code.The table shows high scoring pentamer and hexamer sequences that were identified as intronic alternative splicing regulators in three different studies (Kabat et al., 2006;Ramani et al., 2011;Ratsch et al., 2005).Biases in the occurrence of the cis element in the intron upstream or downstream of an alternative cassette exon are indicated in the second column.Alternative splicing factors predicted to bind to the cis regulatory elements are indicated in the third column.These high-affinity binding predictions are based on the following C. elegans reports; HRP-2 (Kabat et al., 2009), ASD-2 (Ohno et al., 2008), FOX-1 and ASD-1 (Kuroyanagi et al., 2006), and SUP-12 (Anyanful et al., 2004;Kuroyanagi et al., 2007).The prediction for PTB-1 is based on the binding of its human homolog, PTB, directly to CUCUCU repeats (Oberstrass et al., 2005).

Bias of element occurrence in the intron relative to an alternative cassette exon
Predicted trans splicing factor

Genetic identification of alternative splicing factors in C. elegans
Genetic analysis in C. elegans has led to the identification of splicing regulatory pathways.MEC-8 is a nuclear protein that contains two RNA recognition motifs (RRMs).Mutations in mec-8 lead to mechanosensory and other defects.Mutations in mec-8 are synthetically lethal with viable mutations in the unc-52 gene (Lundquist et al., 1996).These viable unc-52 mutations occur only in exons 17 and 18, which are alternatively spliced (Rogalski et al., 1995).Skipping of exons 17 and 18 is dependent on the function of mec-8 (Lundquist et al., 1996).MEC-8 protein is found in all nuclei in embryos, but expression becomes more limited to specific cell types in larvae (Spike et al., 2002).unc-52 mutant alleles with stop codons in exons 17 and 18 develop a progressive paralysis at later developmental stages; in contrast, mutations in other parts of the gene are lethal.Animals containing both viable mutations in unc-52 and viable mutations in mec-8 have a synthetic lethal phenotype that mimics the embryonic lethality of the lethal unc-52 alleles (Lundquist and Herman, 1994).In mec-8;unc-52 double mutant animals, exons 17 and 18 (containing stop codons) cannot be skipped, and this leads to insufficient production of full-length UNC-52 in embryos and thus to the more severe embryonic lethality.Figure 4 summarizes unc-52 alternative splicing and the specific alternative splicing events promoted by MEC-8.In a genetic screen based on this synthetic lethality, two suppressors of the mec-8; unc-52 synthetic lethality (smu) were identified (Lundquist and Herman, 1994).smu-1 encodes a nuclear protein that contains five WD motifs, is ubiquitously expressed and is 62% identical to a human spliceosome associated protein fSAP57 (Spike et al., 2001).smu-2 encodes a C. elegans homolog of the human RED protein, which has also been shown to be associated with the human spliceosome.SMU-1 and SMU-2 bind to each other; SMU-2 stabilizes SMU-1 but SMU-1 does not stabilize SMU-2.Based on their homology with components of the mammalian spliceosome, it is hypothesized that SMU-1 and SMU-2 might interact with the spliceosome and modulate splice site choice (Spartz et al., 2004).A screen for proteins that are synthetic-lethal with mec-8 mutations (sym) uncovered another splicing factor, sym-2, which is a homolog of the human alternative splicing factors hnRNP H and F (Davies et al., 1999;Yochem et al., 2004).In total, analysis of suppressors and enhancers of the viable unc-52 alleles has led to identification of 4 splicing regulatory factors.(Lundquist et al., 1996).
In addition to a role in unc-52 splicing regulation, MEC-8 has also been identified as a regulator of alternative splicing of the mec-2 transcript.MEC-8 is required for removal of mec-2 intron 9, and the sequences required for MEC-8 regulation are contained within the intron 9 sequence (Calixto et al., 2010).Taking advantage of the MEC-8-dependent removal of mec-2 intron 9, and the fact that there is a temperature sensitive mec-8 allele, mec-8(u218ts), Calixto and colleagues have engineered a conditional gene expression system.mec-2 intron 9 is inserted into a gene of interest and this transgene is then transformed into a u218ts background.While the intron is removed efficiently at the permissive temperature, when these worms are raised to the restrictive temperature, intron removal ceases and the gene of interest is turned off (Calixto et al., 2010).
Tissue-specific expression of alternative splicing factors has been demonstrated in C. elegans.SUP-12 is an RRM-containing protein that was identified in a genetic screen for suppressors of unc-60 mutations.unc-60 is alternatively spliced to yield two isoforms, UNC-60A and UNC-60B.UNC-60 is a homolog of the actin depolymerizing factor cofilin; UNC-60A and UNC-60B are expressed in non-muscle cells and muscle cells respectively.sup-12 is expressed in muscle cells and is required for the generation of the UNC-60B isoform in those cells by alternative splicing (Anyanful et al., 2004).Loss of sup-12 in unc-60B mutants allows for splicing to the UNC-60A isoform in muscle cells, and the presence of this cofilin in muscle cells can rescue the uncoordination phenotype.SUP-12 was shown to function through interactions with GU rich elements in introns (Anyanful et al., 2004).A fluorescent reporter-based genetic screen for factors that alter egl-15 alternative splicing also identified sup-12 as a key splicing regulator, again through interactions with GU-rich intronic elements (Kuroyanagi et al., 2007).
Two neuronal, nuclear RNA binding proteins with a role in synaptic transmission have been identified in C. elegans.These are unc-75, which is an ortholog of the human CELF family of alternative splicing factors, and exc-7, which is a homolog of the Drosophila neuronal splicing factor ELAV (Loria et al., 2003).lec-3 has been identified as a gene whose alternative splicing changes in both exc-7 and unc-75 mutant strains (Barberan-Soler et al., 2011).A C. elegans homolog of the human CUGBP1 splicing factor, etr-1, is expressed only in muscle cells and is essential for development (Milne and Hodgkin, 1999).These studies are strongly suggestive of an important role for alternative RNA processing in tissue specification.
Kuroyanagi and colleagues have engineered a creative and powerful genetic screening system with which to analyze alternative splicing.They create dual alternative splicing reporter constructs in which each alternative isoform is fused in frame to either green or red fluorescent protein (GFP or RFP).Two transgenes containing the alternatively spliced region of interest are transformed simultaneously into worms; one version of the transgene has one outcome of the alternative splicing event in frame with GFP, while the other transgene version has the other alternative isoform in frame with RFP (Kuroyanagi et al., 2010).They have used this system in forward genetic screens, screening for changes in the GFP:RFP ratio in the worms that are indicative of changes in alternative splicing.The have searched for regulators of the alternative splicing of egl-15 and of let-2 (Kuroyanagi et al., 2006;Kuroyanagi et al., 2007;Ohno et al., 2008).They have discovered asd-1 (alternative splicing defective) which is a homolog of the fox-1 alternative splicing factor, as a key alternative splicing regulator for egl-15 that works through UGCAUG intronic elements (Kuroyanagi et al., 2006).They demonstrated that ASD-1 and FOX-1 both affect the splicing of this reporter, and that they must have overlapping function since mutations in both genes are required to see a visible phenotype in the absence of the sensitive fluorescent splicing reporters.Kuroyanagi et al. also identified SUP-12 as a regulator of egl-15 splicing using this screen and showed its affinity for GU-rich elements (Kuroyanagi et al., 2007).A screen for regulators of let-2 developmental alternative splicing regulation uncovered asd-2, which is a STAR family RNA binding protein (Ohno et al., 2008).They demonstrated that asd-2 interacts with a CUAAC element in intron 10.This bi-color fluorescent screening system has proven powerful because it is designed to uncover genes directly responsible for splicing regulation of a specific target, and is sensitive enough to detect changes in reporter gene alternative splicing that do not otherwise cause detectable phenotypic changes in the worms.Table 2 summarizes the splicing factors discussed in this chapter and indicates their closest human homologs.The combinatorial interactions of multiple protein factors with the cis-elements surrounding a given alternative splicing event lead to an integrated splicing decision.Yet the mechanism of multifactorial splicing regulation is poorly understood.Using a splicing-sensitive DNA microarray, Barberan-Soler et al. assayed 352 C. elegans alternative cassette exons for changes in embryonic splicing patterns between wild-type and 12 different strains carrying viable mutations in splicing factors.In addition to identifying additional substrates for regulation by splicing factors such as sup-12, mec-8, asd-1, and fox-1, many alternative splicing events were uncovered which are regulated by multiple splicing factors (Barberan-Soler et al., 2011).Many splicing factors were shown to have the ability to behave as splicing repressors for some alternative cassette exons and as splicing activators for others.In addition, an example was uncovered in which the ability of a given alternative splicing factor to behave as an enhancer or repressor of a specific splicing event switches during development.The observation that splicing factors Pre-mRNA splicing and its regulation in Caenorhabditis elegans can change their effects on a substrate during development supports a model in which combinatorial effects of multiple factors, both constitutive and developmentally regulated ones, contribute to the overall splicing decision.

Coupling of alternative splicing to nonsense-mediated decay (AS-NMD)
Alternative splicing can generate isoforms with differential ability to be substrates for the NMD pathway.An alternative splicing-sensitive DNA microarray assay was used to look for changes in the steady state levels of alternative isoforms in different NMD mutant backgrounds (Barberan-Soler et al., 2009).In embryos, 73 different alternative splicing events whose isoform ratios changed in an NMD mutant background were identified.Strikingly, 59% of these NMD-sensitive alternative splicing events did not introduce a premature termination codon, and thus should not have activated the NMD pathway.These were referred to as secondary targets of the NMD pathway.In animals, splicing factors are overrepresented in the genes that are regulated by AS-NMD (Lareau et al., 2007;Morrison et al., 1997;Ni et al., 2007;Wollerton et al., 2004), and 12 examples of alternatively spliced C. elegans alternative splicing factors regulated by AS-NMD were identified in the microarray experiments (Barberan-Soler et al., 2009).Reports in different systems have shown that several splicing factors have an autoregulatory feedback loop that links AS to NMD (Sureau et al., 2001;Wollerton et al., 2004;Zachar et al., 1994).This autoregulation has been linked to regions of high sequence conservation in the pre-mRNA, stressing the evolutionary importance of this process (Lareau et al., 2007;Ni et al., 2007).The loss of NMD in C. elegans smg mutant strains led to stabilization, and potentially the translation, of PTC-containing isoforms of splicing factor genes.Several of these PTC-containing mRNA isoforms, stabilized in the NMD mutants, were shown to enter the polysome pool (Barberan-Soler et al., 2009).Translation of the PTC-containing isoforms of splicing factors, for example translation of the PTC isoform of the rsp-6 SR protein, which would generate a protein with the N-terminal RRM but which would lack the C-terminal SR domain, could produce a dominant-negative splicing factor that could compete with the full-length SR protein for pre-mRNA binding but not activate splicing once bound.The translation of truncated splicing factor proteins in an NMD mutant background could explain the observed changes in alternative splicing of transcripts that are not predicted NMD substrates (Barberan-Soler et al., 2009).
Analysis of alternative splicing coupled to NMD also uncovered PTC-containing transcripts that were inefficiently degraded by NMD, as well as PTC-containing mRNAs whose NMD efficiency changed during development.Using sucrose gradient polysome analysis, it was observed that PTC-containing mRNAs isoforms that are inefficiently degraded are enriched in monosomes (Barberan-Soler et al., 2009).Three different pathways that PTC-containing transcripts could follow were proposed; strong NMD, inefficient NMD and developmentally regulated NMD.Individual representatives of these three different classes are shown in Figure 5.While this figure shows examples of genes that undergo AS-NMD, the features that drive the transcripts to the different NMD pathways are unknown.A role for a trans-acting factor, the RNA binding protein Pub1p, in protecting specific upstream open reading frame-containing mRNAs from the NMD pathway in yeast has been proposed (Ruiz-Echevarria and Peltz, 2000).Recent discoveries that NMD is turned off in cancer cells in their hypoxic microenvironment (Wang et al., 2011), downregulated by myc oncogene overexpression (Wang et al., 2011), and downregulated by expression of microRNA miR-128 in brain (Bruno et al., 2011;Huang et al., 2011) point to the fact the NMD regulation is likely an important mechanism for control of differentiation.Because NMD mutants are viable in C. elegans but not in mice or flies (Avery et al., 2011;Medghalchi et al., 2001), C. elegans is a valuable model system in which to study the interconnection of alternative splicing and the regulation of NMD in animals.
Pre-mRNA splicing and its regulation in Caenorhabditis elegans Note that the features of these transcripts that determine which of the NMD classes the PTC-containing substrates will follow are currently unknown.

Conclusions
With extensive alternative splicing and a strong degree of regulation, C. elegans offers a tractable model system for the study of pre-mRNA splicing regulation.The available bioinformatic, genetic and biochemical approaches have led to a coordinated system with the promise to unlock the splicing code for an organism.The availability of viable mutants in splicing factors along with viable mutants in components of the NMD pathways will allow for important contributions to our understanding of pre-mRNA splicing regulation.With many of the components of the splicing code in place for this organism, the next challenges lie in understanding the mechanisms by which the interaction of these factors with the pre-mRNA regulates the assembly of the spliceosome and ultimately the choice of splice sites.

Figure 1 .
Figure 1.Types of alternative splicing.In these graphics, exons are represented by boxes and introns by lines.Exon regions included in the messages by alternative splicing are colored while constitutive exons are shown in gray.Promoters are indicated with arrows and polyadenylation sites with AAAA…

Figure 3 .
Figure 3. Images from the Intronerator genome browser showing alternatively spliced genes (Kent and Zahler, 2000).Gene isoforms predicted by the Wormbase Consortium are shown in blue and WABA homology alignments for C. briggsae to this region of the C. elegans genome are shown in purple.Dark purple indicates a region of WABA high homology, light purple corresponds to low homology, and white indicates no homology between species (Kent and Zahler, 2000).Regions of alternatively spliced genes: A) W01F3.1,B) ZC477.9,C) ZK637.8,D) H24G06.1 and E) C11D2.6 are shown.Figure adapted fromKabat et al., 2006(Kabat et al., 2006).

Figure 4 .
Figure 4. Alternative splicing of unc-52.Diagram of alternative splicing of the unc-52 gene.Every combination of exon inclusion and skipping of exons 16, 17 and 18 has been detected.MEC-8 enhances the skipping of exons 17 and 18; isoforms promoted by MEC-8 are indicated in red (Lundquist et al., 1996).

Figure 5 .
Figure 5. Schematic examples of substrates for AS coupled to NMD.Only the alternatively spliced regions to the end of the transcribed region are shown.The splicing of the PTC-containing isoform is shown on top and the full length isoform splicing is shown at the bottom.Position of the PTC and the full-length termination codon (TC) are indicated.The length of the genomic region, from the beginning of the constitutive exon upstream of the alternative cassette exon to the end of the 3′ UTR, is indicated along with the class of NMD involved (strong, inefficient or regulated) (Barberan-Soler et al., 2009).Note that the features of these transcripts that determine which of the NMD classes the PTC-containing substrates will follow are currently unknown.

Table 2 .
C. elegans splicing factors described in this chapter.The closest human homolog of each is indicated in the second column.