C. elegans network biology: a beginning.

The architecture and dynamics of molecular networks can provide an understanding of complex biological processes complementary to that obtained from the in-depth study of single genes and proteins. With a completely sequenced and well-annotated genome, a fully characterized cell lineage, and powerful tools available to dissect development, Caenorhabditis elegans, among metazoans, provides an optimal system to bridge cellular and organismal biology with the global properties of macromolecular networks. This chapter considers omic technologies available for C. elegans to describe molecular networks--encompassing transcriptional and phenotypic profiling as well as physical interaction mapping--and discusses how their individual and integrated applications are paving the way for a network-level understanding of C. elegans biology.


Abstract
The architecture and dynamics of molecular networks can provide an understanding of complex biological processes complementary to that obtained from the in-depth study of single genes and proteins.

Introduction
The combined power of global approaches to systematically analyze interactions among genes and their products and the phenotypic consequences of in vivo perturbations promises to drive the development of increasingly sophisticated models of the topology, function, and dynamics of C. elegans molecular networks (Figure 1).The C. elegans genome sequence and its annotation laid the foundation for the development of a variety of functional genomic or "omic" approaches to interrogate molecular networks.This chapter reviews the developments underpinning these technologies, giving particular attention to: i) identifying network components, i.e. the complete lists of non-coding and coding transcripts and open reading frames (ORFs); ii) mapping interconnections between these components; iii) systematic perturbation of networks; iv) spatiotemporal dynamics of network components; and v) global network modeling.We summarize the resources emerging from these efforts, current experimental and computational approaches including the integration of different omic datasets, and developments on the horizon.

Network components: transcriptome and ORFeome
The initial release of the assembled, annotated C. elegans genome sequence (The C. elegans Sequencing Consortium, 1998) was a milestone in genomics, marking the first sequenced genome from a multicellular organism.Importantly for comprehensive analyses, C. elegans remains the only multicellular organism for which a complete genome sequence is available with no gaps in the assembly (Hillier et al., 2005).

Genomic databases
An innovative object-oriented database system, ACeDB (Durbin and Thierry-Mieg, 1994;Stein et al., 2001; see also http://www.acedb.org/Cornell/dkfz.html), was created to facilitate the C. elegans genome project.ACeDB was the first widely used database system designed specifically for capturing and archiving complex biological data.Coupling the database to a visual output anchored on the genomic sequence gave ACeDB a graphical user interface, allowing the C. elegans community to extract far more information than was previously possible.From ACeDB evolved WormBase, a collaborative effort to capture, curate, and distribute biological information on C. elegans and related nematodes (Stein et al., 2001;Chen et al., 2005), and the WormGenes database at NCBI, which also provides access to carefully curated C. elegans genome annotations and functional data (Thierry-Mieg et al., 2005).These databases are invaluable sources of information for the C. elegans research community (see Table 1 for a short guide to these and other useful online resources for C. elegans).

Computational predictions
At the time of the initial genome sequence release, GeneFinder (Green and Hillier, unpublished software) was used to predict approximately 20,000 protein coding sequences (CDSs) or ORFs.Alternative ab initio transcript or ORF prediction algorithms produce largely consistent results that differ primarily in predicted exon/intron junctions (Stein et al., 2003;Wei et al., 2005).Experimental evidence (see below) indicates that most of the ~20,000 predicted ORFs are at least transcribed (Reboul et al., 2001;Reboul et al., 2003).Approximately 1,300 noncoding RNA (ncRNA) transcripts have also been predicted, including at least 120 microRNAs (miRNAs) (see C. elegans noncoding RNA genes).

Transcriptome
Accurate genome annotation requires both a high quality genome sequence and robust computational prediction tools.However, experimental validation of transcribed mRNA sequences and their protein products is indispensable to demonstrate transcription and splicing and to ensure that exon/intron boundaries and reading frames are correctly assigned.These companion activities together lead to the refinement of gene models through an iterative process.After an initial effort by the C. elegans Sequencing Consortium to sequence random expressed sequence tags (ESTs) (Waterston et al., 1992), a long-term effort was dedicated to isolating, sequencing and distributing as many complementary DNA (cDNA) clones as possible, with the ultimate goal of obtaining full-length cDNAs for all protein-coding transcripts (see Table 1).This benefited the worm community by providing cDNAs that helped refine transcript models and enabled cloning experiments performed in many laboratories over the years.Data from this cDNA collection are available from the National Institute of Genetics in Japan (Table 1) and through WormGenes and WormBase, including details on sequences and corresponding gene products.The latest WormBase frozen release (WS150) contains over 250,000 EST reads from this library, matching ~10,500 distinct genes.Around 2,000 of these cDNAs are likely full-length clones.WormBase, WormGenes, and Intronerator (Kent and Zahler, 2000) all offer graphical representations of the collection that make it very convenient to inspect the precise structure of transcript models along with all supporting evidence (Table 1).

ORFeome
Because most transcripts are expressed at relatively low levels, random sequencing of clones from cDNA libraries makes it extremely hard to capture an entire set of full-length cDNAs (Reboul et al., 2003), and the rate of novel sequences identified has reached diminishing returns due to the relative abundance of encoded transcripts in the library (Das et al., 2001).This is true not only for C. elegans but also for other organisms (Carninci et al., 2003;Castelli et al., 2004).In addition, although full-length cDNAs provide crucial information to annotate transcripts, their 5′ and 3′ UTR sequences usually prevent their use for large-scale protein expression studies.Finally, the vectors used in cDNA cloning projects are usually not designed for automated cloning and convenient proteome-wide protein expression (Brasch et al., 2004).For these reasons, a C. elegans "ORFeome project" was initiated with the goal of generating Gateway recombinational clones for nearly all ORFs (i.e.precisely from the predicted initiation to the predicted termination codons) (Walhout et al., 2000;Reboul et al., 2001;Reboul et al., 2003;Lamesch et al., 2004).
Rather than picking and sequencing individual clones from cDNA libraries, PCR amplification carried out with primer pairs specific to ORFs and with a cDNA library or reverse-transcribed polyA + mRNA as template can achieve nearly complete coverage (Reboul et al., 2001).Using the ~20,000 ORF predictions from WS9 (released in 1999), nearly 11,000 ORF constructs suitable for protein expression were initially obtained (Reboul et al., 2003) (Figure 2).Over 4,000 of these had no prior experimental support, demonstrating that ab initio ORF predictions could indeed be verified experimentally.The nearly 8,000 unsuccessful attempts were due to imprecise exon/intron predictions, usually surrounding the putative translation start site (Reboul et al., 2001).From 1999 to 2003, updates to WormBase led to over 4,200 new or revised ORF predictions, based on a combination of improved algorithms, ~2,500 were cloned in Version 3.1 of the C. elegans ORFeome project (Lamesch et al., 2004).In total, there is currently experimental evidence for ~18,000 protein-coding ORFs.WormBase release WS150 contains over 22,000 ORF entries, of which ~2,800 are confirmed isoforms from alternative splicing, and ~4,800 are predicted ORFs with no experimental validation.The majority of clones from the ORFeome and cDNA collections have been end-sequenced (generating ESTs and OSTs) but not yet fully characterized.Sequencing of several hundred ORF clones suggests that more isoforms exist than are currently annotated in WormBase, and that the number of genes with alternative splicing is probably in the range of 15%, (Reboul et al., 2003).A complete and accurate inventory of all isoforms represented by fully sequenced, wild-type clones awaits further experimental validation guided by gene prediction algorithms -such as Twinscan (e.g.Twinscan Korf et al., 2001;Wei et al., 2005) and others -that can take advantage of sequence comparisons with other Caenorhabditis species (Stein et al., 2003; see The phylogenetic relationships of Caenorhabditis and other rhabditids; NHGRI, 2005).Access to precise gene structural information has enabled the development of multiple genome-scale resources that can be used for a variety of applications (Table 2), transforming the way experimental science is conducted.Overall, one important lesson learned from the C. elegans cDNA and ORFeome projects is that genome annotation and experimental verification are complementary activities that advance understanding through an iterative process.
comparative genomics, and experimental evidence from cDNA and ORF sequences.Of these repredicted ORFs,

Network connections: interactome
Proteins do not function in isolation, but rather as interacting partners in complexes (Alberts, 1998), modules and networks (Hartwell et al., 1999).Thus a major challenge is to establish the full set of all protein-protein, DNA-protein and RNA-protein interactions, or to map the "interactome" network.Binary protein-protein interactions can be readily identified using the yeast two-hybrid (Y2H) system (Fields and Song, 1989;Vidal and Legrain, 1999), with the advantage that this system is amenable to high-throughput proteome-wide mapping projects (Walhout and Vidal, 2001).When generated with appropriate caution and necessary controls, Y2H datasets can be of high quality (Vidalain et al., 2004).
The C. elegans interactome mapping project was initiated in the context of specific biological processes such as vulval development, protein degradation, germline development, DNA damage response and Dauer formation (Walhout et al., 2000;Davy et al., 2001;Boulton et al., 2002;Walhout et al., 2002;Tewari et al., 2004).One of the main lessons drawn by overlapping these datasets obtained up to that point was a much higher level of molecular connectivity between seemingly different pathways than previously imagined (Reboul et al., 2003).The project then moved to a "first-draft" proteome-scale C. elegans interactome map primarily focused on metazoan-specific proteins (Li et al., 2004).The current version of the global C. elegans Y2H-based interactome network contains ~5,500 potential interactions, comprising approximately 10% of the expected number of protein-protein interactions (Li et al., 2004).The validation rate of the interactome map was estimated at ~70% by retesting large numbers of interactions using a completely different interaction assay (Li et al., 2004), suggesting a high level of quality overall.
Like other interactome networks, the topology of the C. elegans interactome appears "scale-free" (a few proteins have many interacting partners whereas most proteins interact with only a few partners) and "small-world" (any two proteins can be connected through a chain of only a few interactions) (Barabasi and Albert, 1999;Goldberg and Roth, 2003;Li et al., 2004).These and other topological properties of global interactome networks might relate to biological properties such as robustness and plasticity (Jeong et al., 2001;Han et al., 2004).
Completing the interactome map will require both a nearly complete C. elegans ORFeome collection as well as additional approaches that utilize modified versions of either Y2H or affinity-purification of protein complexes, since no single method for capturing protein-protein interactions is 100% effective (Walhout et al., 2000).As with efforts to accurately annotate the genome, interactome mapping projects will necessarily proceed iteratively through the interplay of direct experimentation and computational efforts (Vidal, 2005).

Network perturbations: phenome
While physical interaction maps help elucidate molecular events at the biochemical level, other functional assays can reveal logical connections between gene products that are required for the same set of biological processes.RNAi (Guo and Kemphues, 1995;Fire et al., 1998) is amenable to medium to high throughput approaches that can be used to systematically knock down large numbers of transcripts, thereby relating perturbations of molecular networks to phenotypic analyses and setting the stage for generating a global phenotypic or "phenome" map of C. elegans (reviewed in Gunsalus and Piano, 2005).The foundation for a phenome map of early embryogenesis has been laid by a host of recent large-scale RNAi studies, which have provided at least a first-pass phenotypic analysis for nearly every protein-coding gene in the genome (Fraser et al., 2000;Gönczy et al., 2000;Piano et al., 2000;Maeda et al., 2001;Piano et al., 2002;Kamath et al., 2003;Simmer et al., 2003;Rual et al., 2004;Fernandez et al., 2005;Sönnichsen et al., 2005).Results from various RNAi studies in C. elegans are available online through RNAiDB (Gunsalus et al., 2004), PhenoBank (Sönnichsen et al., 2005), and WormBase (see Table 1).
Questions of reliability and completeness naturally arise regarding genome-scale RNAi analyses.Comparisons with genetic analyses have found that around 25% of the loci for which genetic alleles confer embryonic phenotypes are typically not recovered in large-scale RNAi studies in the laboratory strain N2 (Kamath et al., 2003;Fernandez et al., 2005;Sönnichsen et al., 2005).In contrast, comparisons of embryonic lethality between genetic and high confidence RNAi results indicate a very low level (generally <1%) of false positives (Kamath et al., 2003;Simmer et al., 2003;Rual et al., 2004;Fernandez et al., 2005).Potential false positive results could arise from the depletion of multiple transcripts by a single RNAi construct that inadvertantly targets multiple genes (Gunsalus et al., 2004;Qiu et al., 2005).The actual off-target error rate in vivo will depend on factors such as cooperativity between short interfering RNAs (siRNAs), mRNA abundances in different tissues, and saturation of the RNAi machinery.RNAiDB and WormBase both provide data on potential off-target effects for RNAi reagents used by the worm community, calculated using different methods.
It is interesting to note that the proportion of essential genes appears to be in a similar range (~15-25%) in yeast (Winzeler et al., 1999;Giaever et al., 2002), worms (see Essential genes), and flies (Ashburner et al., 1999).This apparent similarity across phyla may reflect a universal property of networks, arising from a need to evolve network connectivity in order to maintain a balance between robustness (to survive perturbations) and plasticity (to explore new phenotypic space).An observation consistent with this idea is that the penetrance level observed in the RNAi analysis of essential genes is proportional to how ancient the targeted protein is (Fernandez et al., 2005).

Systematic phenomics
Global RNAi studies are valuable not only to generate hypotheses for individual proteins, analogous to conventional genetic screening strategies, but also to address higher-order questions about groups of functionally related proteins.A set of defined phenotypes resulting from perturbations can be systematically scored for both positive and negative effects, defining "phenotypic profiles".Gene products can then be grouped into "phenoclusters" based on similarity between phenotypic profiles (Vidal, 2001;Piano et al., 2002).For example, two distinct phenoclusters were found among two dozen proteins implicated in a DNA damage response interactome map (Boulton et al., 2002).One phenocluster predicted novel DNA repair proteins, while the other suggested novel proteins involved in the DNA damage induced checkpoint response.
In a set of comprehensive studies, RNAi phenotypes in the early embryo were analyzed using a systematic approach to explicitly score over 40 specific cell biological features (Piano et al., 2002;Sönnichsen et al., 2005).The resulting "phenotypic signatures" were used to cluster genes into groups of similar function, such as chromosome segregation or nuclear import-export.In a single study analyzing RNAi phenotypes for 98% of C. elegans genes, nearly all of the 661 genes identified genome-wide to elicit an RNAi phenotype in the early embryo could be placed into 23 separable phenotypic classes (Sönnichsen et al., 2005).In addition numerous hypotheses could be formulated as to how molecular networks are coordinated to drive early embryogenesis (Gunsalus et al., 2005).
The development of automated image analysis could alleviate what is currently a principal bottleneck in phenotypic analysis: the manual scoring of phenotypes by visual inspection.Recently a formal classification system linked to an automated image analysis system was developed that is capable of automatically gathering many behavioral and gross morphological phenotypic parameters (Geng et al., 2003;Feng et al., 2004;Geng et al., 2004).This system has been used to classify genotypes based on automatically extracted image features.Automated image analysis pipelines are also under development for analyzing cell lineage patterns (Yasuda et al., 1999;Bao et al., 2006) and high-content embryonic phenotypes (Ning et al., 2005).

Network dynamics: from transcriptome to localizome
The transcriptome and proteome at any given time and place constitute only a subset of all possible macromolecules potentially encoded in the genome: different combinations of expressed gene products are present in different cells at different times, move within and between cellular compartments, and interact with a variety of other molecules to carry out their functions.Thus to relate C. elegans cellular and organismal biology to global properties of molecular networks, it will become increasingly important to establish when and where the complete repertoire of coding and noncoding transcripts and protein products are expressed and localized throughout development, both at a cellular and sub-cellular level.Such a "localizome" map would be extremely valuable in the context of C. elegans since a nearly perfect anatomical atlas has already been generated, providing a complete description of the cell lineage and a map of the neuronal network (Sulston and Horvitz, 1977;White et al., 1982;Sulston et al., 1983).

Dynamic transcriptome analyses
Technological innovations to parallelize studies of gene expression patterns have enabled genome-scale analyses of increasing sophistication to discover the spatiotemporal activity of promoters, to define sets of co-regulated genes, to identify shared cis-acting regulatory sites and corresponding trans-acting factors, and to use these data to build models of transcriptional networks and regulatory control mechanisms.Gene expression in C. elegans is being evaluated using a variety of methods, including microarrays (Hill et al., 2000;Reinke et al., 2000), serial analysis of gene expression (SAGE) (Jones et al., 2001;Pleasance et al., 2003), quantitative reverse-transcription with PCR (Q-RT-PCR) (reviewed in Bustin, 2000), and in situ analyses of mRNA expression patterns (Table 2).Tiling arrays -now in use for human (Bertone et al., 2004), Drosophila (Stolc et al., 2004), and Arabidopsis (Yamada et al., 2003) -should soon reveal alternative splicing patterns and expressed non-coding regions of the genome, providing further experimental data for improving genome annotation and defining the repertoire of functional genetic elements.See Tables 1-3 for representative examples of reagents and datasets.Specific developmental processes, temporal transitions, or environmental responses in C. elegans have been examined by measuring gene expression in wild type and genetically altered, RNAi-treated, or environmentally challenged animals using microarray and SAGE (Table 3 provides a sampling of these studies).Remarkably, it has been possible to isolate mRNA from single tissues or cell types such as muscle or neurons, using highly specific promoters to drive transgenic constructs expressing either poly-A binding protein followed by co-immunoprecipitation (co-IP) (Roy et al., 2002;Kunitomo et al., 2005) or a fluorescent marker followed by fluorescence activated cell sorting (FACS) (Zhang et al., 2002;Colosimo et al., 2004;Cinar et al., 2005;Fox et al., 2005).To identify gene regulatory elements controlling expression and the logical circuitry driving transcriptional networks, both computational and experimental approaches are being pursued.Informatic analyses have provided insights into co-regulated groups of genes, either within C. elegans (Kim et al., 2001;Owen et al., 2003), or conserved across species (Stuart et al., 2003;McCarroll et al., 2004).Predictions of cis-regulatory elements based on conserved sequence motifs, either among co-expressed genes (GuhaThakurta et al., 2002;Zhang et al., 2002;Kwon et al., 2004;Cinar et al., 2005) or gene families (McCarroll et al., 2005), or in combination with positional and combinatorial constraints (Beer and Tavazoie, 2004), are generating testable hypotheses on gene regulatory circuitry and rules.Experimental efforts to decipher network logic are employing techniques such as yeast one-hybrid assays (Deplancke et al., 2004;Deplancke et al., 2006), in vivo analysis of green fluorescent protein (GFP) reporter expression (Chalfie et al., 1994), and chromatin immunoprecipitation (ChIP-chip technology;Hanlon and Lieb, 2004;Oh et al., 2006).The recent demonstration that functional GFP can be reconstituted from fragments driven by separate promoters (Zhang et al., 2004) opens the gate to large-scale systematic analysis to delineate combinatorial control in vivo.

Developmental transitions
Increasingly, combined experimental and informatic efforts seek to characterize higher-order features of transcriptional networks.These global approaches have discovered, for example, the chromosomal clustering of muscle-expressed genes (Roy et al., 2002), or different classes of radiation-modulated genes (Nelson et al., 2002).Finally, comparative studies are addressing questions such as the evolution of gene expression patterns (Denver et al., 2005) and cis-regulatory elements (Castillo-Davis et al., 2004).While these studies represent just an early foray into the analysis of gene regulatory networks, significant strides in this area can be anticipated in the near future.

In situ and in vivo analyses
High-resolution data on spatiotemporal expression patterns throughout development can be obtained using microscopic techniques.A large-scale in situ analysis of mRNA is being carried out using cDNAs (see Table 1) to identify when and where transcripts are expressed in the worm.In addition, C. elegans provides an unparalleled opportunity to map dynamic patterns of expression in vivo, due to its invariant and comprehensively identified cell lineage (Sulston and Horvitz, 1977;White et al., 1982;Sulston et al., 1983) and the fact that animals are optically transparent throughout the life cycle.Transgenic animals containing different tissue-specific promoters driving GFP provide surrogate markers for expression, and protein-GFP fusion reporter constructs allow the direct visualization of subcellular localization.Systematic in vivo analyses of expression patterns are being carried out for thousands of genes (see Table 1) (McKay et al., 2003;Hope et al., 2004;Zhao et al., 2004;Baillie and Moerman, 2005), using transgenic worms that harbor reporter constructs containing GFP fused downstream of cloned intergenic promoter regions (Hope et al., 1996;Dupuy et al., 2004).This approach requires the ability to efficiently generate large numbers of transgenic worm strains transformed with cloned DNA fusion constructs and, ideally, should faithfully report promoter activity in vivo.The systematic large-scale application of this approach will require relatively convenient scoring by microscopy in live animals and will rely heavily on ongoing efforts to develop sophisticated imaging technologies (Mohler and White, 1998;Mohler et al., 2003;Bao et al., 2006).Collections of transgenic strains (e.g.carrying GFP reporter constructs) are being made available for distribution (Table 2).

Network modeling: "omic" data integration
The idea that meaningful hypotheses of gene function and network properties may be obtained by combining two or more types of functional associations (reviewed in Vidal, 2001;Ge et al., 2003) is being explored using a variety of approaches.Global evidence suggests that genes with similar expression profiles are more likely to encode interacting proteins (Ge et al., 2001;Grigoriev, 2001;Jansen et al., 2002;Kemmeren et al., 2002).Correlations between protein-protein interactions and phenotypic data have also been observed, both in yeast (Said et al., 2004) and among C. elegans DNA damage response (DDR) (Boulton et al., 2002) and germline-enriched genes (Walhout et al., 2002).
Extending a genome-wide phenotypic analysis of genes required in the early embryo, a combined analysis of high-content phenotypic data (Sönnichsen et al., 2005), transcriptional correlations (Kim et al., 2001), and protein-protein interaction data (Li et al., 2004) has revealed correlations between all three data types (Gunsalus et al., 2005) (Figure 3).It appears that on a global scale, highly interconnected proteins give rise to the same range of phenotypes upon depletion by RNAi, and quantitatively similar phenotypes are predictive of protein-protein interactions.Extending this logic across the proteins found to be required for early embryonic events revealed assemblies of functionally linked components, generating testable functional predictions for numerous unknown genes.This analysis suggests that a limited number of interconnected "molecular machines" drive early embryogenesis and provides a global blueprint to explore how protein complexes are coordinated (Gunsalus et al., 2005).
Understanding how functional interactions between genes affect biological processes will ultimately require combinatorial in vivo perturbations.Since experimentally testing the 200 million possible pairwise combinations between 20,000 genes for each type of experimental perturbation (e.g., reduced or increased level of activity, or alteration of specific contacts) is currently intractable, computational methods are needed that can integrate heterogeneous data and help focus experimental efforts on pairs that may be more likely to be functionally related.A recent study has made genome-wide predictions of genetic interactions in C. elegans using a logistic regression approach to integrate data on protein-protein interactions, gene expression, and phenotypes from yeast, worm, and fly with available functional annotations (Zhong and Sternberg, 2006).Like the early embryonic network described above, the resulting network of over 18,000 potential genetic interactions between about 10% of genes in C. elegans suggests a modular organization of protein complexes and specific cellular processes.These predictions provide a step toward a global view of genetic interactions in a multicellular organism.
Visualizing multidimensional maps is essential to navigate the vast amounts of data now accumulating.To make the results of integrated network analyses more easily accessible, a web-based interactive graphical tool called "N-Browse" has been implemented to allow browsing integrated networks of physical and logical functional connections between genes and their products in C. elegans and other species (Table 1).(Li et al., 2004), and (iii.)high-content phenotypic data (Sönnichsen et al., 2005).The three datasets are integrated (iv.) revealing correlations between all three data types (Gunsalus et al., 2005).Extending this logic across all protein-coding genes found to be required for early embryonic events reveals assemblies of functionally linked components, allowing testable functional predictions for numerous proteins.

Perspectives
The application of omic approaches to elucidate C. elegans network biology promises to extend our knowledge of cellular and organismal biology as more efforts are dedicated to the development of resources and the generation of improved interactome, phenome and localizome datasets.First, it will be important to improve and eventually complete genome-wide resources such as the ORFeome, the Promoterome, and other sets of important DNA sequences such as 5′ and 3′ untranslated regions (UTRs), miRNAs and other noncoding RNAs.Second, it is critical to continue improving the mapping of molecular networks by developing and implementing new protein-protein interaction assays (Vidal, 2005) and other types of methods to assay DNA-protein (Deplancke et al., 2004) and RNA-protein (SenGupta et al., 1996) interactions.Third, it will be increasingly important to expand the phenome map by increasing the number of discrete phenotypes scored systematically and by improving and extending flexible RNAi resources (Rual et al., 2004), phenotype scoring methodologies, and image analysis pipelines.Fourth, the localizome map is currently only in its infancy, and it will be interesting to see how its development will help bridge network biology with the nearly perfect anatomy atlas available for C. elegans.Fifth, because disruption of biologically relevant interactions may be expected to phenocopy either null or gain-of-function alleles, it will be useful to exploit strategies for generating and using mutations that specifically perturb individual interactions (Vidal et al., 1996;Vidal et al., 1996;Endoh et al., 2000;Endoh et al., 2002).
The combination of experimentation and informatic analysis in iterative cycles will take on a central role in the future of biological research.As more data from functional and proteomic mapping projects are generated and more sophisticated methods are applied to integrate these data on a genome/proteome scale, their synthesis should prove increasingly informative for the reconstruction of cellular networks and for generating new testable hypotheses of biological systems.The challenge is now to achieve a true synthesis of the data to obtain a system-level view of the underlying regulatory networks and the evolutionary forces that have shaped them.This will require the marriage of large datasets with computational models that can describe and predict the behavior of dynamic macromolecular networks and their relationship to the underlying biological processes they control.

Figure 1 .
Figure 1.Omic approaches to generate gradually improving models of the complete set of biologically relevant interactions.Genome-and proteome-scale studies have provided information on the topology, function and dynamics of C. elegans macromolecular networks.Expression profiling (transcriptome), DNA-protein, RNA-protein, and protein-protein interactions (interactome), phenotypic analyses (phenome), and protein localization (localizome) can be integrated, allowing the dynamic properties of the resulting network to be related to C. elegans biology.Adapted from Walhout et al. (1998) and Vidal (2001).

Figure 2 .
Figure 2. The C. elegans ORFeome.A) Schematic representation of predicted exon/intron structures for three C. elegans ORFs from WS9 of WormBase.Blue boxes correspond to predicted exons based on Genefinder.Based on ORF sequence tags (OSTs) obtained from C. elegans cloned ORFs are aligned to the genome (red boxes), the structures of two ORFs match the Genefinder prediction whereas one differs from prediction.B) Agarose gels showing PCR products for ORFeome clones.The ~10,000 ORFs in C. elegans ORFeome Version 1.1 (Reboul et al., 2003) (left-hand panel, 200 gels) were supplemented by an additional ~2,500 ORFs in Version 3.1 (right-hand panel, 49 gels) based on improved annotations in WormBase WS100 (Lamesch et al., 2004).

Figure 3 .
Figure 3. Combined analysis of (i.) transcriptional correlations (Kim et al., 2001); reprinted with permission from Kim et al. (2001).Copyright 2001 American Association for the Advancement of Science.(ii.) protein-protein interaction data(Li et al., 2004), and (iii.)high-content phenotypic data(Sönnichsen et al., 2005).The three datasets are integrated (iv.) revealing correlations between all three data types(Gunsalus et al., 2005).Extending this logic across all protein-coding genes found to be required for early embryonic events reveals assemblies of functionally linked components, allowing testable functional predictions for numerous proteins.

Table 2 . Collections of reagents for functional genomics and proteomics in C. elegans.
http://www.cbs.umn.edu/CGC/C. elegans strains (including gene knockout strains from the C. elegans