Table of Contents

About 70% of C. elegans mRNAs are trans-spliced to one of two 22 nucleotide spliced leaders. SL1 is used to trim off the 5' ends of pre-mRNAs and replace them with the SL1 sequence. This processing event is very closely related to cis-splicing, or intron removal. The SL1 sequence is donated by a 100 nt small nuclear ribonucleoprotein particle (snRNP). This snRNP is structurally and functionally related to the U snRNAs (U1, U2, U4, U5 and U6) that play key roles in intron removal and trans-splicing, except that it is consumed in the process of splicing. More than half of C. elegans pre-mRNAs are subject to SL1 trans-splicing. About 30% are not trans-spliced at all. The remaining genes are trans-spliced by SL2. These genes are all downstream genes in closely spaced gene clusters similar to bacterial operons. They are transcribed from a promoter at the 5' end of the cluster of between 2 and 8 genes. This transcription makes a polycistronic pre-mRNA that is co-transcriptionally processed by cleavage and polyadenylation at the 3' end of each gene, and this event is closely coupled to the SL2 trans-splicing event that occurs only approximately 100 nt further downstream. Recent studies on the mechanism of SL2 trans-splicing have revealed that one of the 3' end formation proteins, CstF, interacts with the only protein known to be specific to the SL2 snRNP. The operons contain primarily genes whose products are needed for mitochondrial function and the basic machinery of gene expression: transcription, splicing and translation. Many operons contain genes whose products are known to function together. This presumably provides co-regulation of these proteins by producing a single RNA that encodes both.

primarily genes whose products are needed for mitochondrial function and the basic machinery of gene expression: transcription, splicing and translation.Many operons contain genes whose products are known to function together.This presumably provides co-regulation of these proteins by producing a single RNA that encodes both.

Trans-splicing
mRNAs of ~70% of C. elegans genes begin with a 22 nucleotide sequence, the spliced leader or SL, which is not associated with the gene (recently reviewed in Blumenthal andSteward, 1997 andHastings, 2005).The SL is donated by a 100 nucleotide RNA, SL RNA, by trans-splicing.This process is closely related to cis-splicing (intron removal): the 5' splice site is on the SL RNA, and the site of SL addition, the trans-splice site is the 3' splice site on the pre-mRNA (Figure 1).The reaction proceeds by way of a branched intermediate similar to the lariat of cis-splicing.Trans-splicing is catalyzed by spliceosomes, including U2, U4, U5, and U6 snRNPs but not U1 (Hannon et al., 1991).A similar reaction occurs throughout the nematode phylum as well as in some protists (e.g., trypanosomes) and many other animals including flatworms, hydra and primitive chordates (Agabian, 1990;Davis, 1997;Ganot et al., 2004;Stover and Steele, 2001;Vandenberghe et al., 2001).SL1 is the major spliced leader in nematodes, and it is used primarily for trans-splicing at the 3' splice sites following outrons, sequences resembling introns at the very 5' ends of C. elegans pre-mRNAs (Conrad et al., 1991).About half the genes are estimated to have outrons and are consequently trans-spliced to SL1 (Zorio et al., 1994).Furthermore, a second SL, SL2, is trans-spliced to some C. elegans genes (Huang and Hirsh, 1989).SL2 trans-splices exclusively at trans-splice sites between genes in polycistronic pre-mRNAs from operons (Blumenthal et al., 2002;Spieth et al., 1993).About 15% of all C. elegans genes are found in operons.Finally, about 30% of genes specify mRNAs that are not subject to trans-splicing.In these cases, the promoter is at the 5' end of the first exon like in genes of organisms that do not trans-splice.In cis-splicing, the U1 snRNP base pairs with the 5' splice site, and U2 snRNP base pairs with the branchpoint near the 3' splice site.The intron is excised and the two exons are spliced together.In trans-splicing there is no 5' splice site on the pre-mRNA for U1 snRNP binding.Instead, the 5' splice site is provided by the donor SL snRNP, which interacts with the U2 snRNP at the 3' splice site, and the SL exon is spliced to the exon on the pre-mRNA.The region between the 5' cap and the trans-splice site is called the outron.CBC: nuclear cap binding complex.

Trans-splicing precursors
The SL RNAs exist as snRNPs (Blumenthal and Steward, 1997;Hastings, 2005).They have a discrete secondary structure as do other snRNAs, they are bound to the Sm proteins, and they have a trimethylguanosine (TMG) cap like the U snRNAs.In the SL snRNPs the 5' splice sites are base paired to the upstream part of the SL Trans-splicing and operons sequence, resembling the U1-5' splice site base pairing.The trans-splice site consensus on the pre-mRNAs is the same as the intron 3' splice site consensus.The signal for trans-splicing is simply the presence of intron-like sequence, the outron, at the 5' end of the mRNA, with no functional 5' splice site upstream (Conrad et al., 1995;Conrad et al., 1993;Conrad et al., 1991).Genes whose pre-mRNAs are subject to trans-splicing are distinguished from those that are not only by the presence of an outron.Thus, the choice between cis-and trans-splicing at any 3' splice site is based solely on the presence or absence of an upstream 5' splice site.Because trans-splicing is relatively efficient (like cis-splicing), it is difficult to isolate outron-containing pre-mRNAs, so very few natural outrons have been defined.Nevertheless, in a few cases the promoters of trans-spliced genes or start sites of outrons have been identified [e.g., col-13 has a 64 bp outron (Park and Kramer, 1990) and rol-6 has a 172 bp outron (Conrad et al., 1993) ].

Mechanism of trans-splicing
Trans-splicing follows the same course as cis-splicing: cleavage of the 5' exon, the SL, and formation of an intermediate with the 5' splice site of the SL RNA branched to an A on the outron in the first step, and splicing of the SL to the first exon of the pre-mRNA in the second step (Figure 1).However, we still do not know how the SL snRNP is attracted to a 3' splice site on a pre-mRNA target only when there is no 5' splice site upstream.One can imagine U2 snRNP interacting with the 3' splice site and attracting the SL1 snRNP only when there is no upstream U1 snRNP with which to interact, but this is conjecture.A second possibility is that the SL snRNP is attracted to the trans-spliceosome by an affinity for the nuclear cap binding complex present at the 5' end of the pre-mRNA, and therefore just upstream of the trans-splice site (Figure 1), or by an affinity for the RNA polymerase II C-terminal domain.

The role of snRNP proteins
The SL snRNP was purified from Ascaris, a parasitic nematode, and found to contain a heterodimeric protein, neither subunit of which has known homologs outside of the nematodes (Denker et al., 2002).The larger subunit, SL-175p, is actually 95 kD, but runs anomalously large on SDS gels.In C. elegans, this polypeptide has a single homolog of 75 kD that is required for viability and is associated with both SL1 and SL2 snRNPs (M.Kumar, M. MacMorris, A. Larsen, and T. Blumenthal, unpublished).The smaller Ascaris polypeptide, SL-30p, interacts with the branchpoint binding protein, an interaction that was shown to be important for trans-splicing (Denker et al., 2002).SL-30p has two orthologs in C. elegans: SL-21p is associated with the SL1 snRNP and is more closely related to the Ascaris protein than is the other ortholog, SL-26p.The latter is associated primarily with the SL2 snRNP (M.Kumar, M. MacMorris, A. Larsen and T. Blumenthal, unpublished).These proteins may be the key to both SL1 versus SL2 specificity and mechanistic aspects of the reaction.

Evolution and role of trans-splicing
Trans-splicing occurs throughout the nematodes, and there is striking conservation of the SL sequence, whereas the portions of the SL RNAs downstream of the splice site have diverged (Blumenthal and Steward, 1997).The role that SL plays in the cell, however, is not known..In C. elegans the SL tends to be spliced very close to the initiating methionine codon (often immediately adjacent), so it seems likely to play a role in translation initiation (Blumenthal and Steward, 1997;Lall et al., 2004).The TMG cap present at the 5' end of the SL becomes the 5' end of trans-spliced mRNAs.A TMG cap stimulates translation in nematodes, at least when it is present at the 5' end of the SL sequence (Lall et al., 2004;Maroney et al., 1995).In C. elegans, variants of the cap binding translation initiation factor, eIF4E can recognize the TMG cap (Keiper et al., 2000).
In Ascaris, the SL sequence in the DNA is needed for transcription of the SL RNA gene, which may be one reason why it has been so highly conserved (Hannon et al., 1990).Although the roles the SL sequence itself may perform are unknown, trans-splicing is in fact required for viability (Ferguson et al., 1996).Its required role could be a positive effect such as providing a sequence that can facilitate translation initiation, mRNA stability or localization, or it could be required for suppression of a negative effect such as inhibition of translation initiation by AUG codons in the outron.
The C. elegans genome contains 110 SL1 RNA genes on the 1 kb tandem repeat that also contains the genes for 5S rRNA (Krause and Hirsh, 1987).In contrast, the genome contains only18 dispersed SL2 RNA genes, which specify a variety of variant SL2 RNAs (Stein et al., 2003).Some of these have different SL2 sequences, and these have been given different names, such as SL3, SL4 etc. (Ross et al., 1995).Nonetheless, they are all variants of SL2 Trans-splicing and operons and they are used randomly at SL2-accepting trans-splice sites (T.Blumenthal, unpublished).The C. briggsae genome also contains 18 SL2 RNA genes, and all 36 genes from the two species descended from four primordial SL2 RNA genes present in their last common ancestor (Stein et al., 2003).

Discovery of operons
The gpd-3 gene and several other genes whose mRNAs receive SL2 were found to occur at downstream positions in closely-spaced clusters of identically oriented genes (Blumenthal, 2004;Spieth et al., 1993;Zorio et al., 1994).Although first genes in some clusters are trans-spliced to SL1, others are not trans-spliced.The presence of a gene in a downstream location in a closely-spaced cluster signals that its product should be SL2 trans-spliced (Figure 2).A microarray analysis of the entire genome demonstrated how truly robust this correlation is (Blumenthal et al., 2002).This analysis identified more than 1000 such clusters in which the downstream mRNAs are trans-spliced to SL2; these clusters contain more than 2600 genes.These operons range from two to eight genes in length and can cover more than 50 kb of the genome.They are found on all chromosomes, but they are rare on the X chromosome.How does chromosomal position translate into SL2 specificity?The clusters are operons in the sense that the gene cluster is transcribed from a single promoter and regulatory region.The resulting polycistronic pre-mRNA is converted into monocistronic mRNAs by cleavage and polyadenylation at the 3' ends of the upstream genes, accompanied by SL2-specific trans-splicing at the 5' ends of the downstream genes.A look at the list of 1000 operons reveals some general properties (WormBase operon list).The genes are very close together; typically only 100 bp separates the poladenylation site of the upstream gene from the trans-splice site of the downstream gene, although the spacing can range up to a kb or two in extreme cases.Both SL2 and SL1 can be used to process downstream genes, with a greater fraction of SL2 when the genes are closer together (T.Blumenthal, unpublished observations).

Signals on the polycistronic pre-mRNA for SL2 trans-splicing
Besides the splice site itself, only two sequences on the pre-mRNA play an important role in SL2-specific trans-splicing: the two presumptive signals for 3' end formation of the gene just upstream (Figure 2; Huang et al., Trans-splicing and operons 2001;Kuersten et al., 1997;Liu et al., 2001;Liu et al., 2003).The AAUAAA just 5' of the cleavage site, which is absolutely required for 3' end formation, binds the cleavage and polyadenylation specificity factor (CPSF), and a U-rich sequence just 3' of the cleavage site binds the cleavage stimulatory factor (CstF).CPSF and CstF bind cooperatively to these two sites and together position the site of cleavage.When the AAUAAA is mutated, 3' end cleavage fails to occur, and trans-splicing just downstream becomes less efficient and less specific for SL2.Nevertheless, AAUAAA is not required for SL2 trans-splicing since SL2 trans-splicing downstream still occurs in its absence.CPSF bound to AAUAAA may act by facilitating binding of CstF to the U-rich sequence or by catalyzing 3' end formation itself, which in turn may play a role in SL2 trans-splicing.The protein that binds to the U-rich sequence appears to play the major role in SL2 trans-splicing.When this sequence is mutated, 3' end formation can still occur, albeit somewhat less efficiently, but downstream trans-spliced product fails to accumulate.Thus, either transcription terminates or the downstream RNA is degraded from the site of 3' end formation.When 3' end formation is prevented by mutating both the AAUAAA and the U-rich sequence, downstream product is restored, but all of it is trans-spliced to SL1.The protein that binds to the U-rich element, presumably CstF, performs two functions: it blocks exonucleolytic degradation of the downstream RNA beginning at the site of 3' end cleavage and it recruits the SL2 snRNP.The MS2 phage coat protein tethered to this region in place of the U-rich sequence can substitute for the first function, but not the second (Liu et al., 2003).
How does the protein bound to the U-rich element attract the SL2 snRNP?Presuming the protein is CstF, it appears to do so by direct interaction.Antibodies to CstF have been found to immunoprecipitate the SL2 snRNP from C. elegans embryo extracts, and the region of SL2 RNA required for SL2 identity is also required for the CstF interaction, although it is not required for snRNP function (Evans and Blumenthal, 2000;Evans et al., 2001).The current model for polycistronic pre-mRNA processing involves first 3' end cleavage at the upstream gene by conventional mechanisms.This sets in motion the chain of events that leads to SL2 trans-splicing: the free 5' phosphate at the cleavage site is attacked by an exonuclease that is then stopped at the U-rich element by the CstF that had bound there for 3' end formation.CstF attracts, or is already bound to, the SL2 snRNP, which splices to the trans-splice site just downstream.

SL1-type operon
The C. elegans genome also contains a second type of operon different from those described above in two significant ways (Figure 2; Williams et al., 1999).The mRNA of the downstream gene is trans-spliced to SL1, rather than SL2, and there is no intercistronic sequence.The site of polyadenylation of the upstream gene and the trans-splice site are at adjacent nucleotides.Only twenty operons of this type have been identified, so it is quite uncommon (T.Blumenthal, unpublished).This kind of operon is distinguished from the major type by the fact that 3' end formation of the upstream gene eliminates the trans-splice site of the downstream gene, so that any given pre-mRNA gives rise to either the upstream or the downstream mRNA, but not both.

Function of operons
Do the C. elegans operons exist to assure coordinate regulation of genes whose products function together?There is no question that the genes in C. elegans operons are often co-expressed (Land et al., 1994) but in many cases they appear not to be (K.Seggerson-Gleason and T. Blumenthal, unpublished observations;Baugh et al., 2003).This could be because the mRNAs have very different stabilities or because processing is modulated.One reasonable idea is that the operons have accumulated genes that are regulated at some level after transcription initiation.In this model the operons are transcribed from promiscuously expressed promoters because they are regulated at the level of RNA stability or translation.Consistent with this idea, genes that encode proteins that act in mRNA degradation are the most frequent class of genes to be contained in operons (80% are in operons; Blumenthal and Gleason, 2003).These may be autogenously regulated at the level at which they act.This idea is quite similar to current thinking on gene regulation in trypanosomes, where all transcription is polycistronic and genes are thought to be regulated primarily or entirely post-transcriptionally (Clayton, 2002).
However, there are unquestionably examples among the operons where genes of related function are co-expressed due to their presence in an operon under the control of a single promoter.For instance, the two lin-15 genes are contained in an operon; these two unrelated proteins collaborate in an aspect of signal transduction required for formation of the vulva (Clark et al., 1994;Huang et al., 1994).A second example is the des-2/deg-3 operon, which encodes both subunits of the acetylcholine receptor channel (Treinin et al., 1998).Clearly, these two genes are co-regulated in the same transcription unit because their products function together.In another instance a protein function was hypothesized based on the two genes being present in an operon together: one gene was found Trans-splicing and operons to encode a modifier of the gating of an ion channel encoded by another gene in the same operon (Furst et al., 2002).There are also several examples of operons that encode both a basic transcription factor for RNA polymerase I, II or III, along with a subunit of that polymerase.So co-expression of genes of related function is clearly a feature of at least some of the operons (Blumenthal and Gleason, 2003).
Certain classes of genes are dramatically overrepresented in operons while other classes are missing or nearly so from the operon list (Blumenthal and Gleason, 2003).In general, tissue-specific genes are not transcribed in operons.The most frequently operon-included genes are those that encode mitochondrial proteins and the basic machinery of gene expression: transcription, splicing and translation.It appears that the driving force for operon inclusion is high expression in the female germ line (Reinke, 2004).

Evolution of operons in the nematodes
The C. elegans operons are probably not ancient but are instead an innovation, perhaps having evolved as a response to selection for a small genome due to very rapid cell divisions during embryogenesis.The two molecules known to be specific for operon pre-mRNA processing, SL2 RNA and SL-26p, an SL2 snRNP protein, are both evolving very rapidly, consistent with the idea that they are relatively recent innovations; T. Blumenthal, unpublished).Once formed, the operons appear to be very stable: 97% of C. elegans operons are present in C. briggsae, a nematode that is ~100 million years diverged from C. elegans (Stein et al., 2003).
The SL2 snRNP has been found in several members of the rhabditid nematode group including Caenorhabditis, Oscheius, Haemonchus and Pristionchus, many of which have been shown to use SL2 to trans-splice downstream genes in operons (Evans et al., 1997;Lee and Sommer, 2003;Redmond and Knox, 2001).Distantly related nematodes may not contain a specialized snRNP for processing operons, but they appear to contain operons nonetheless (Whitton et al., 2004).It seems likely that operons developed early in nematode evolution because they had trans-splicing to process downstream mRNAs.The same snRNP could serve for either outron or operon processing.However, during nematode evolution, a new snRNP, SL2, evolved from SL1, specialized for processing operon pre-mRNAs more efficiently.It appears this was accomplished by evolving a site for CstF interaction (Evans et al., 2001).Once this snRNP evolved, it was able to process operons efficiently, so it provided the opportunity for further evolution of operons.These additional operons then provided the selective pressure to improve SL2 function in operon processing.
Evolution away from dependence on trans-splicing and operons must be very difficult.Trans-splicing removes selective pressure contraints on regions upstream from the splice acceptor site, presumably allowing upstream AUG codons to accumulate in the region that is removed by trans-splicing in most mRNAs.If trans-splicing did not occur, translation of these mRNAs would be abrogated by the out of frame AUG codons.Similarly, operons may be stable because downstream genes become dependent on the upstream gene's promoter.Flatworms and primitive chordates, representing two very distantly related animal phyla, also have both trans-splicing and operons, so the evolutionary scenario hypothesized above may have occurred several times in different animal lineages (Davis and Hodgson, 1997;Ganot et al., 2004).

Acknowledgements
I am grateful to Dick Davis for helpful comments on the manuscript.This work was supported by grants from the National Institute of General Medical Sciences.

Figure 1 .
Figure 1.Comparison of cis-and trans-splicing.In cis-splicing, the U1 snRNP base pairs with the 5' splice site, and U2 snRNP base pairs with the branchpoint near the 3' splice site.The intron is excised and the two exons are spliced together.In trans-splicing there is no 5' splice site on the pre-mRNA for U1 snRNP binding.Instead, the 5' splice site is provided by the donor SL snRNP, which interacts with the U2 snRNP at the 3' splice site, and the SL exon is spliced to the exon on the pre-mRNA.The region between the 5' cap and the trans-splice site is called the outron.CBC: nuclear cap binding complex.

Figure 2 .
Figure 2. Two types of operons in C. elegans.Almost all C. elegans operons are of the SL2-type, shown above.The site of 3' end cleavage and polyadenylation of the upstream gene is determined by the AAUAAA signal, indicated by the black bar.The Ur element serves to protect the downstream RNA from degradation following 3' end cleavage, as well as to attract the SL2 snRNP.Most operons of this type have about 100 bp between the cleavage and polyadenylation site of the upstream gene and the trans-splice site to which SL2 is spliced.A few C. elegans operons are of the SL1-type, where the site at which 3' cleavage and polyadenylation occurs is also the trans-splice site for the downstream gene, so the intercistronic distance is effectively zero.These operons are always trans-spliced by the SL1 snRNP.