Mass spectrometry-based shotgun proteomic analysis of C . elegans protein complexes *

Mass spectrometry (MS)-based shotgun proteomics is an enabling technology for the study of C. elegans proteins. When coupled with co-immunoprecipitation (CoIP), new interactions and functions among proteins can be discovered. We provide a general background on protein complexes and methods for their analysis,


Introduction
C. elegans is an excellent model system for basic biology and increasingly for the study of disease phenotypes (WormBook chapter Obesity and the regulation of fat metabolism; Rodriguez et al., 2013).A remarkable amount of genetic reagents, resources, and information has been generated though these basic and translational studies.With the advent of numerous technologies, including mass spectrometry-based proteomics, studies of the protein-protein interactions within C. elegans are becoming commonplace (Tewari et al., 2004;Audhya and Desai, 2008; Biochemistry and molecular biology).These studies are beneficial since they can leverage the wide array of existing information and resources of the C. elegans community to quickly follow-up and understand protein-based discoveries.
There are some obvious advantages and challenges associated with using C. elegans as a biochemical tool, particularly for CoIP-MS of protein complexes, compared to more commonly used cell culture systems.First and foremost, novel protein-based findings can be quickly validated or rejected in a phenotypically-testable system (WormBook chapter Reverse genetics; Dong et al., 2007;Moresco et al., 2010).Thus, the inferred functional relevance and cooperativity between a protein of interest (POI) and a candidate protein that is pulled down during a CoIP can be directly correlated.Second, tissue-specific protein expression can also be considered, facilitating and/or necessitating the differentiation of tissue-specific protein interactions from CoIP-MS results.That is, the analysis of different cell types from worms, instead of a homogenous cell culture type, can provide information about the multiple functions and forms of a protein or protein complex.For instance, integration of a tagged fusion protein can be characterized to ensure appropriate cellular and subcellular localization.Additionally, if the fusion protein is integrated into a null mutant, the rescue of associated phenotypes can be tested to ensure proper protein function.Once integrated and characterized, a tissue-specific expressed and tagged POI can be used to pull down tissue-specific binding partners.Identified candidate tissue-specific POI binding partners can be validated through cytological co-localization experiments.The cuticle helps to make C. elegans a robust and easily-handled organism (WormBook chapter The cuticle), yet provides a tough exterior for protein complex-friendly lysis and separation of tissue types.Abundant proteins from large tissues are ideal CoIP-MS targets, but can obscure the POI if it is only expressed in a small number of cells.Sample complexity can be reduced with synchronization of worms or purification of specific cell types, such as embryos, and can alleviate some of these challenges.For worm CoIP methods, we point the reader to a number of excellent protocols on this topic (Biochemistry and molecular biology; Polanowska et al., 2004; Jedamzik and Eckmann, 2009), particularly to Zanin et al. (2011), which provides MS-compatible elution conditions.Throughout this chapter we will describe the methods that can be used to identify post-translational modifications, protein interactions, and complexes, enabling the researcher to exploit the advantages of C. elegans as a model organism to study basic biology and disease mechanisms.

Types of protein complexes and analysis methods
Proteins can be present in different types of complexes (Figure 1).The simplest is an obligate dimer where two proteins are always found together.Condition-specific complexes may only form in the presence of a post-translational modification, such as phosphorylation.Transient complexes, perhaps involving enzymes, may only interact briefly.The different time scales and stabilities of complex interactions can be defined by varying affinities of protein-protein interactions.Protein complexes that are part of larger macromolecular structures can co-purify other proteins and need to be isolated from the larger structure.Smaller, stable protein complexes can assemble as subunits to form larger "complex isoforms" or "complexoforms".These complexoforms can also contain protein isoforms or PTMs (Talavera et al., 2013), recently deemed "proteoforms" (Smith et al., 2013).Because of the complex combinatorial nature of proteoforms and complexoforms, is it important for the researcher to determine which proteoforms of their POI may be tagged and/or pulled down from a CoIP. ) have illustrated the concept of "complex isoforms" through the modularity of protein complex architecture.For instance, complexes averaged three to 12 protein components, depending on the study, ranging from two to 83 overall; some proteins were more promiscuous and were identified in many purifications and complexes.In general, these global CoIP-MS studies performed originally in yeast, and more recently with human cell lines (Ewing et al., 2007;Havugimana et al., 2012), have changed the way protein complexes are elucidated and understood.Thus, the researcher must be aware of these complexities and capabilities when designing an experiment and interpreting the results.Essentially, when a CoIP is performed on a single POI, all complexoforms which the POI participates in will be co-purified and co-identified.Complexoforms can be differentiated through multiple reciprocal purifications of different POI proteoforms and complex components (Gavin et al., 2006) and through intact complex analysis methods (Heck, 2008;Fonslow et al., 2010).A preparative growth of animals is lysed in a protein complex-compatible buffer with a dounce on ice, bead beater at 4°C, or in liquid nitrogen with a mortar and pestle.The POI is CoIPed and eluted for proteomic analysis.Quality control, gel-based analysis is performed on 10% of CoIP (Figure 4).Shotgun proteomic analysis is performed on the remaining 90% of the CoIP.Proteins are precipitated with tricholoracetic acid (TCA)/acetone or methanol/chloroform to remove lipids and surfactants.Proteins are resolubilized, primarily with 8 M urea, then alkylated, reduced, and digested to peptides with trypsin.Tryptic peptides are analyzed using MudPIT, consisting of online nLC/LC-MS/MS.Peptides are fractionated by charge with strong cation exchange (SCX) chromatography and then further separated by hydrophobicity with reverse-phase high performance liquid chromatography (RP-HPLC).Separated peptides are transferred into the mass spectrometer by electrospray ionization (ESI) where their intact and fragment ion masses can be measured with either high or low mass accuracy and resolution using the LTQ or the Orbitrap, respectively.Bioinformatic analysis is performed on the MS/MS data.A SEQUEST protein database search is used to match experimental and theoretical peptide spectra and to localize PTMs.Spectra can be manually annotated to validate peptide identification or PTM localization.In this case, the blue and green annotations are sequence informative ions while the red ions are phosphorylation site-specific ions that localize the PTMs.DTASelect is used to filter confident peptide matches and reassemble the peptides into proteins.Further analysis can be performed between replicate and comparative runs with Venn diagrams or more sophisticated statistical software (see Figure 5).
In addition to identification of POI binding partners, analysis of CoIPs by MS has other notable advantages.Numerous PTMs (Table 1) can also be readily identified during CoIP-MS experiments.PTMs can be identified on peptides by consecutive measurements of mass differences in both intact peptide and fragment ion masses.Condition-specific PTMs and interactions can be discovered through the purification of protein complexes from different genetic backgrounds and conditional treatments.Moreover, the identification of co-immunoprecipitated proteins with MS provides an unbiased, broad view of the interactions and functions of a POI, facilitating new and unexpected biological discoveries (Skop et al., 2004

Description of protein of interest (POI) case studies
The remaining sections will refer to actual CoIP-MS data we have generated and analyzed as case studies for this chapter.The first POI (POI # 1) is ALP-1, an alpha-actinin-associated LIM-Enigma family protein that is required for maintenance of actin filament organization and for stabilizing muscle contraction.ALP-1 is a soluble, abundant protein, making CoIP-MS analysis relatively straightforward.ALP-1 falls into the category of macromolecular-bound since it associates with actin, so we will discuss the results in this context.We performed CoIPs from UU18, a strain with GFP fused to an ALP-1 exon that is expressed as three isoforms (proteoforms): ALP-1b, c, and d (McKeown et al., 2006).Therefore, we expect at least three complexoforms to exist.A GFP-expressing strain driven by the identical alp-1 promoter would have been ideal to serve as a negative control, but was not available.Instead we used a strain expressing GFP in the muscle as the bulk of ALP-1 expression is also in the muscle.We used anti-GFP antibody to perform the CoIPs in this example.POI # 2, a soluble protein of lower abundance than ALP-1, is from an on-going CoIP-MS study.We included this case study since the experimental design, quality control experiments, and results were ideal to illustrate the successful execution of a CoIP-MS study using an antibody to an endogenous protein.We expect many complexoforms to be isolated, as this polyclonal antibody should recognize all isoforms of POI # 2. A null mutant strain of POI # 2 was used for negative control CoIPs.

Protein lifecycle considerations in protein complex analysis
When performing a CoIP we are usually attempting to isolate the POI at a time when it is performing its function.Co-isolated proteins are thought to be similarly involved in that function.However, every protein also interacts with a large number of regulatory or "housekeeping" proteins during its defined lifecycle (Figure 3).Thus, when performing and interpreting results from a CoIP-MS experiment it is important to consider that a POI can be purified from all points within its lifecycle from synthesis to degradation.The co-immunoprecipitated lifecycle proteins should be identified as binding partners although they may not be of interest to the researcher.Proteins involved in protein synthesis (initiation and elongation factors, ribosomal proteins, etc.), protein folding (chaperones and heat shock proteins), and degradation (ubiquitin-associated proteins, etc.) will likely be identified in most CoIPs.For instance in our POI # 1 case study (Table 2), we found many regulatory proteins (HSP-60, EFT-4, RPS-6, INF-1, and UBA-1) more abundant in the CoIP than the negative control.An epitope can be purified from the time it is made (A) to the time it is degraded (D).Although the researcher may only be interested in functional binding partners and complexes (C), other "housekeeping" proteins and complexes (A, B, D) will also be purified and identified.

Sources and considerations for background protein binding
The interpretation of CoIP-MS results can also be aided by an understanding of potential sources of background proteins.Background proteins can generally be classified into two categories-random or systematic.Random background proteins can be introduced through small variations in sample preparation or analysis and may often be due to non-specific interactions with abundant proteins within the lysate.Additionally, both random background proteins and bona fide binding partners may be identified pseudo-randomly in replicate runs.This is due to the sensitive, but pseudo-random sampling of low abundance proteins/peptides using tandem MS in shotgun proteomic analysis of protein complexes (Liu et al., 2004).Generally, increasing the number of replicate runs allows for better separation of background noise and bona fide binding partners.This concept has been exploited in many computational analyses of protein complex data and is described in Section 8. Systematic background protein interactions present a greater, but surmountable challenge since they are reproducibly identified in all runs.The proper use of negative controls can reduce systematic noise through identification of background proteins that bind to the antibody, CoIP support (agarose, sepharose, magnetic beads, etc.), or purification tag (FLAG, GFP, etc.).For instance, performing a CoIP on an untagged lysate identifies proteins that bind the support and/or antibody.These background proteins can also be removed through tandem purification (Biochemistry and molecular biology) although at the expense of recovery and yield.For larger purification tags, such as GFP, a strain with GFP expressed from a transcriptional fusion could be used as a negative control.For instance in a case study of POI # 1, tbb-2 was abundant in both GFP control and ALP-1::GFP runs indicating it either bound to the beads, antibody, or GFP tag.In our case study of POI # 2, proteins purified from a strain lacking POI # 2 likely bound to the beads or antibody.Thus, the negative control runs should account for any background proteins from the CoIP support and antibody.Another source of systematic background proteins can be introduced from co-purifying macromolecules.For instance in our case study of POI # 1, ALP-1 is a known muscle-associated protein (Han and Beckerle, 2009;McKeown et al., 2006).Consequently, from our POI # 1 CoIPs we identified muscle proteins (ACT-1, ACT-2, ACT-3, ACT-4, UNC-52, NMY-1, and MYO-3) with higher spectral counts than ALP-1 (Table 2).Similar results could be expected for CoIP of DNA-binding proteins (i.e., transcription factors) which pull down an abundance of histones via DNA (Figure 1C).In the case of DNA binding proteins, DNase or probe sonication can be used to remove DNA that would otherwise co-purify non-protein-protein interactions.
The combination of in-solution digestion and high-sensitivity MS analysis of protein complexes provides much greater sensitivity and comprehensiveness than traditional methods.These analytical characteristics provide the opportunity to identify weak, sub-stoichiometric, and transient protein interactions with a target protein, but also allow for identification of more background proteins.We will discuss statistical and comparative analyses that can aid in differentiating binding partners from background proteins in Section 8. Despite these bioinformatics methods to identify and remove background proteins, shotgun proteomic analysis will be more comprehensive and straightforward if protein complex purifications are relatively pure.As described in the next section, traditional biochemical and molecular biology methods can be used to evaluate and minimize background proteins in a CoIP preparation.

Co-immunoprecipitation quality control
Two traditional methods can be used to ensure the success of a CoIP-MS experiment and minimize the time and cost associated with MS-based shotgun proteomic analyses.Western blotting analysis should primarily be used for ensuring the expression of the POI and tracking the POI throughout the CoIP process, as in Figure 4A.Depletion of POI # 2 from the lysate can be observed by comparison of whole worm lysate (lane 1) to the CoIP supernatant (lanes 3 and 5), while enrichment can be detected by comparison of whole worm lysate to CoIP eluate (lanes 7 and 9).The analogous gel-based protein-staining analysis is also beneficial.Thus, we also suggest probing all proteins from lysates, CoIP supernatants, and 5-10% of the CoIP with coomassie-or silver-stained gel to estimate the purification and enrichment of the POI and its binding partners (Figure 4B).In this case, side-by-side comparison of the CoIP to the lysate (Figure 4B, compare lanes 7 and 9 to lane 1, respectively) allows for evaluation of the protein complex purity and POI # 2 abundance.One of the main requirements for a successful CoIP-MS analysis is that a band with the approximate mass of your POI should be present in a gel-based analysis and be one of the most abundant proteins from the CoIP.Having ample abundance of the POI means that many potential binding partners are also likely present in detectable amounts (Figure 4B, lanes 7 and 9).Note that the appropriate negative control runs have been performed to establish that purified proteins are specific to POI # 2 (Figure 4B, lanes 8 and 10).Gel-based analyses can serve to evaluate purity and yield.If the control and purification lanes are very similar then the CoIP is not pure.There are multiple experimental variables that can be tested.Suggested points in the CoIP pipeline to consider include: the expression of the protein in relevant strains, particularly if a POI is transgenically-tagged; the POI recovery from lysis; antibody integrity and conjugation to beads; and POI recovery from the CoIP.With a relatively pure and abundant POI from a CoIP, MS-based shotgun proteomics can now be performed to identify proteins and PTMs within the sample.Note that shotgun proteomic analysis is more sensitive if proteins are not cut out of gels and instead analyzed entirely in-solution (Das et al., 2010).Thus, the remaining 90% of a CoIP sample (after gel-based quality control experiments) can be subjected to in-solution tryptic digestion and shotgun proteomics analysis (Figure 2).With many technological advances to mass spectrometers, MS analysis can be as sensitive as a Western blot for a POI.However, a common question we have encountered is "if my bait protein can be detected by Western, why is it not detected by shotgun proteomics?"There are two likely answers to this question.First, the main difference between a Western and shotgun proteomic analysis is detection selectivity.With a Western, as long as the antibody is specific, the other proteins present in the CoIP do not interfere with the detection of the POI or its binding partners.Thus a Western yields no relevant information about the purity of the CoIP.That is, Western results can appear essentially the same for a whole worm lysate (Figure 4A, lane 1) or for a highly pure protein complex (Figure 4A, lanes 7, and 9) since only POI # 2 is detected.Shotgun proteomic analysis identifies peptides/proteins within a sample based on relative abundance.If the bait protein has not been purified and enriched from the other abundant proteins within the lysate, shotgun proteomic analysis will mostly identify the abundant background proteins and not the POI (Figure 4B, lane 1).Secondly, another major difference between Western and shotgun proteomic analysis is sensitivity.Although mass spectrometers are extremely sensitive, a peptide/protein signal is not amplifiable as in a Western analysis.Thus, even if a protein complex is pure and detectable by Western, it may not be detectable by MS.

Mass spectrometry methods for protein and PTM identification
Peptides are compatible with both nanoflow liquid chromatography (nLC) and tandem MS (MS/MS).Online multidimensional nLC/LC facilitates efficient separation and identification of modified and unmodified peptides.Mass spectrometers can be generally categorized as having either low or high resolution and mass accuracy.Resolution and accuracy define the capabilities to confidently measure the masses of peptides and their gas-phase fragments for sequencing.Currently utilized low-resolution instruments are primarily ion traps (Figure 2) and are regularly used for protein identification and some PTM identification.High-resolution instruments, such as the Orbitrap (Figure 2), can be used for all experiments.However, they are most beneficial in experiments to identify PTMs and quantitative experiments with isotopic labeling (Krijgsveld et al., 2003), although isotopic methods have yet to be applied to C. elegans CoIPs.Peptides are identified by matching experimentally-generated fragmentation spectra to theoretically-generated fragmentation spectra from protein sequence databases (Eng et al., 1994).Peptide matches from reversed protein sequences are appended to the database and can be used to measure false matches and control protein, peptide, and spectrum false discovery rates (FDR usually ~1% for proteins and ~0.1% for peptides).Higher mass accuracy measurement of intact peptide masses can be used as an additional filter to improve sensitivity, specificity, and confidence in peptide identifications with database searches (Yates et al., 2006).Peptide identifications serve as representations for protein and PTM identifications; thus, higher mass accuracy and higher confidence in peptide identifications also improve the sensitivity, specificity, and confidence in both protein and PTM identifications.This is illustrated in Table 3 where we have displayed the results of a protein database search including three common PTMs (phosphorylation, acetylation, and ubiquitination) with low resolution (LTQ) and high resolution (Orbitrap) data from replicate POI # 1 CoIPs.A cursory look at the results would indicate that low resolution provides better results.Indeed the low resolution MS sampling is faster and thus more comprehensive over the same analysis time.Conversely, the high-resolution results are smaller and more stringent, but are of much higher confidence.This is the reason the low resolution MS instrument is often used for identifying proteins and the high-resolution instruments for confident PTM identification.To resolve this discrepancy, high-resolution MS instruments are consistently improving in speed and sensitivity to approach the comprehensiveness of low resolution instruments (Olsen et al., 2009).
Theoretically, 100% sequence coverage of a POI is possible since all possible peptide sequences are considered in the protein database search, including N-terminal, C-terminal, and internal peptides, along with multiple missed trypsin cleavage sites and sometimes non-tryptic peptides.Consideration of PTMs can also add to the overall protein sequence coverage.Practically, 50% protein sequence coverage is a reasonable result for a highly abundant protein in a CoIP sample, but is protein and sequence dependent.Protein sequence coverage is assigned by grouping the identified peptides to represent the minimum number of proteins that best explain the peptide sequences.Unique peptide sequences provide unambiguous evidence that a protein is present, while non-unique peptides contribute to protein sequence coverage and relative quantitation.Unique peptides are those that are not present in other proteins within the protein database.Conversely, non-unique peptides are those that are present within at least two proteins.Non-unique peptides often come from protein isoforms, but can also be present in redundant proteins.The number of unique peptides identified for a protein is dependent both on its sequence homology to other proteins within the protein database used and the abundance of the protein within the sample.Since proteins identified by unique peptides can share non-unique peptides, the non-unique peptides are often replicated and grouped within all relevant proteins.The appropriate grouping of peptides in shotgun proteomics is called the "Protein Inference Problem" (Nesvizhskii and Aebersold, 2005) and is generally addressed computationally.Although this can present identification and quantification challenges, the sensitivity, comprehensiveness, and power of shotgun proteomics in biological discovery has far exceeded any shortcomings from the analysis of proteins as a mixture of peptides.Additionally, the "Protein Inference Problem" is less of a challenge for simple mixtures, such as CoIPs described herein versus whole cell lysates; fewer non-unique peptides are actually present and identified within a sample of fewer proteins, making assignment of the non-unique peptides more straightforward.Analysis of purified protein complexes of enough mass (high nanogram to low microgram) with MudPIT generally yield high protein sequence coverages (>50%) of the POI and its binding partners; although the ability to achieve high sequence coverages is POI and CoIP dependent.For instance, in both of our case studies we achieved approximately 30% sequence coverage (Figure 4B and Table 3).A benefit of high protein sequence coverage is a higher probability of identifying post-translationally modified peptides, even if substoichiometric.The most commonly searched and identified PTMs (Table 1) are phosphorylation, acetylation, ubiquitination, and methylation.Almost all PTMs can be identified using mass spectrometry with the appropriate sample preparation and analysis considerations (Zhang et al., 2013), but this is outside the scope of this chapter.The measurement of both peptide precursor and fragment ion masses allows for identification and localization of many types of PTMs on proteins.That is, the modification mass can typically be detected on both the intact peptide and on the fragments to confirm the PTM identity and site of modification.An example phosphopeptide with phosphorylation site localization is shown in Figure 2 (lower left).Trypsin is primarily used for proteolytic digestions of protein complexes and can provide excellent protein sequence coverages and identification of most PTMs.For regions of proteins that have too many or too few tryptic cleavage sites, other less-selective proteases, such as subtilisin and elastase, can be used to improve protein sequence and PTM coverage (MacCoss et al., 2002;Fonslow and Yates, 2012).

Identification of candidate and bona fide binding partners
Shotgun proteomics provides an effective means to identify proteins and PTMs within a sample from a CoIP.The most abundant and readily identified proteins from a database search of a CoIP-MS run should be from bona fide protein interactions.Thus, the quality of CoIP-MS results can, in part, be quickly and manually confirmed by the presence of known protein binding partners, if available.Generally these results are difficult to interpret unless compared to the appropriate negative control or conditional CoIP runs.There are both simple and complex methods for comparing CoIP-MS results, mostly dependent on the scale, depth, and complexity of analyses.The simplest analysis is a pairwise Venn comparison.This comparison may be most similar to a gel-based comparison since either the presence or absence of a protein is considered (Tabb et al., 2002;Carvalho et al., 2008); since shotgun proteomics is more sensitive than gel-based analysis, the depth of the comparison is greater and more informative.For instance, the number of proteins uniquely found using shotgun proteomic Venn comparison of POI # 2 (Figure 5A) is much greater than those visible using a gel-based analysis (Lanes 7 and 9 in Figure 4B).Often, for strongly associated, robust binding partners this simple Venn analysis easily reveals these interactions for further follow-up and characterization.This is illustrated by the two known binding partners that were only present in the CoIP (Figure 5A).The consideration of protein identification reproducibility, either by identification frequency (Figure 5B) or p-value (Figure 5C), between CoIP replicate analyses can also be included in a Venn comparison to further increase confidence in binding partner candidates (Carvalho et al., 2011).From these more stringent considerations, the same 15 proteins were only found in POI # 2 CoIPs (Figures 5B and 5C) and could be considered for biological validation.Similarly, proteins that are identified in only the negative control experiments yield a rough estimate of the background noise in the analyses.The noise can be better modeled by high-level statistical methods, but by considering identification frequency (Figure 5B) and p-value (Figure 5C) the noise is reduced from 113 proteins (Figure 5A) to only three proteins.Ideally there will be more proteins unique to the POI CoIP, than the negative control CoIP.Similarly, these methods can be applied to find changes in binding partners between conditional treatments.However, subtle differences can be hard to detect with this method and generally require higher-level analyses.Additionally, these Venn analysis methods can break down if the POI or bona fide binding partners are abundant and are also detected to a limited extent in negative control experiments.A few proteins from our POI # 1 case study are shown in Table 2 as examples from a Venn comparison.As in our POI # 2 case study (Figure 4B), we achieved a reasonable number of spectral counts (136 ± 5) and sequence coverage (33 ± 5%) for our POI # 1, ALP-1.Additionally, we identified a number of expected binding partners with even higher spectral counts and sequence coverages, presumably due to their macromolecular, polymeric nature.The identified actin isoforms (ACT-1-4) highlight that homologous proteins can be a challenge to differentiate with shotgun proteomics since they are not analyzed in their intact form.Redundant peptides can indicate at least one of the isoforms is present, but unique peptides must be identified to determine which isoform is present.If we were interested in quantifying the relative abundances of actin isoforms, the use of dNSAF instead of spectral counting would be ideally suited.Using this simple analysis method, we found a potential ALP-1 binding partner, TES-1, a LIM-domain binding protein, which was not present in the GFP control runs.Adding another level of complexity to the analysis, the relative abundance of proteins can also be considered between samples instead of simply considering absence or presence in the CoIP.This type of analysis considers changes in proteins that are present reproducibly in both samples within the overlap of the Venn diagrams (Figures 5B and 5C) and can be used for comparisons with negative controls or conditional CoIPs.Protein abundances for comparison between samples are estimated using peptide spectral counts (Liu et al., 2004)-the number of times spectra are identified for a protein, similar to transcript microarray abundance values (Pavelka et al., 2008) or mRNA reads in RNA-Seq-and can be normalized to protein length by a normalized spectral abundance factor (NSAF) (Zybailov et al., 2006).The use of NSAF facilitates comparison of relative abundances of different proteins within the same sample.Since many proteins share peptide sequences, distributing shared peptide spectral counts among redundant proteins yields a distributed NSAF (dNSAF) that provides further improvements to estimate relative abundances within a sample, particularly for different proteoforms (Zhang et al., 2010).In this case, comparison of protein relative abundances using spectral counting is used to find candidate binding partners.A Volcano plot considers both the fold-change of relative protein abundances and the significance of the fold-change as a p-value.By comparison of triplicate POI # 2 CoIPs to negative control CoIPs with a Volcano plot (Figure 5D), the expected proteins (POI # 2 and known binding partners) have the highest and most significant fold-changes, revealing other potential binding partners with high significance and lower fold-changes.Changes in protein binding partners could also be evaluated using this method by comparison of conditional treated and untreated POI CoIP results.Each level of comparison provides a means to narrow and refine candidate binding partners to testable, physiologically relevant, bona fide binding partners.
Even more sophisticated analyses provide greater confidence in binding partners.Through statistical considerations of multiple experimental variables, bona fide binding partners are more easily differentiated from background noise (Krogan et al., 2006;Choi et al., 2011;Jager et al., 2012).Standard triplicate analyses facilitate statistical considerations of experimental error with the majority of these methods.However, power law global error modeling of spectral counting data has shown the largest gain in both accuracy and precision when an additional fourth replicate is performed, with diminishing returns from more replicates (Pavelka et al., 2008).We have analyzed our triplicate case-study POI # 2 CoIP data using CompPASS (Figure 5E).CompPASS considers the uniqueness, abundance, and reproducibility of proteins within parallel runs by establishing a threshold that is inherent to the sampling of mass spectrometry-based shotgun proteomic analysis (Sowa et al., 2009).Similar to the Volcano plot comparison, POI # 2, known binding partners, and candidate binding partners cluster within the plot.These candidates can then be used for biological follow-up.

Conclusions and perspectives
Mass spectrometry-based proteomics provides a powerful tool to study C. elegans proteins.Remarkable results have routinely emerged from genetic and RNAi studies in C. elegans, particularly when combined with enabling technologies.Mass spectrometry-based proteomics is following a similar, enabling trend and we hope to continue facilitating these studies.Through our own work and collaborations with C. elegans biologists we have learned the power of these analyses, along with some of the common pitfalls, challenges, and misconceptions; we have explained and addressed these recurring topics to advance this area of research.We envision this chapter bridging a gap between C. elegans biologists and protein mass spectrometrists, acting as a guide for C. elegans protein complex preparation, shotgun proteomic analysis, data interpretation, and troubleshooting.

Glossary
CoIP-MS -co-immunoprecipitation followed by mass spectrometry dNSAF -distribution normalized spectral abundance factor; a relative measure of protein abundance that considers the shared sequences between homologous and redundant proteins by distributing spectral counts.MS/MS -tandem mass spectrometry; mass spectra are acquired on both intact and fragmented peptide ions to sequence the peptide.
MudPIT -Multidimensional Protein Identification Technology NSAF -normalized spectral abundance factor; a normalized relative measure of protein abundance that considers the length of proteins by dividing the spectral counts by the protein sequence length.

Figure 1 :
Figure 1: Types of protein interactions and complexes: (A) obligate interactions, (B) condition-specific interactions, (C) transient complex between an enzyme and substrate representing transient interactions, (D) a DNA-bound protein representing protein complexes that are part of large macromolecular structures, and (E) "complex isoforms" or "complexoforms" containing cores, modules, and attachments (Adapted from Gavin et al., 2006) as combinations of obligate, condition-specific, and transient interactions.While binary protein interactions can be determined with the yeast two-hybrid system (Boxem et al., 2008; Simonis et al., 2009) (Figure 1A), higher-order, in vivo protein-protein interactions are better elucidated using CoIP-MS.CoIP-MS experiments (Figure 2) generally involve the purification of a POI and its binding partners, followed by Multidimensional Protein Identification Technology (MudPIT) (Washburn et al., 2001), more generally known as shotgun or bottom-up proteomics (Zhang et al., 2013).Shotgun proteomics can identify hundreds of proteins from a CoIP within a few hours in an unbiased fashion.Putative interactions can be validated through reciprocal CoIPs with detection by shotgun proteomics or Western blotting.Global reciprocal CoIP-MS experiments performed in yeast (Gavin et al., 2002; Gavin et al., 2006; Krogan et al., 2006; Collins et al., 2007; Babu et al., 2012) have illustrated the concept of "complex isoforms" through the modularity of protein complex architecture.For instance, complexes averaged three to 12 protein components, depending on the study, ranging from two to 83 overall; some proteins were more promiscuous and were identified in many purifications and complexes.In general, these global CoIP-MS studies performed originally in yeast, and more recently with human cell lines(Ewing et al.,  2007; Havugimana et al., 2012), have changed the way protein complexes are elucidated and understood.Thus, the researcher must be aware of these complexities and capabilities when designing an experiment and interpreting the results.Essentially, when a CoIP is performed on a single POI, all complexoforms which the POI participates in will be co-purified and co-identified.Complexoforms can be differentiated through multiple reciprocal purifications of different POI proteoforms and complex components (Gavin et al., 2006) and through intact complex analysis methods(Heck, 2008;Fonslow et al., 2010).

Figure 2 :
Figure 2: CoIP-MS schematic for analysis of C. elegans protein complexes.A preparative growth of animals is lysed in a protein complex-compatible buffer with a dounce on ice, bead beater at 4°C, or in liquid nitrogen with a mortar and pestle.The POI is CoIPed and eluted for proteomic analysis.Quality control, gel-based analysis is performed on 10% of CoIP (Figure4).Shotgun proteomic analysis is performed on the remaining 90% of the CoIP.Proteins are precipitated with tricholoracetic acid (TCA)/acetone or methanol/chloroform to remove lipids and surfactants.Proteins are resolubilized, primarily with 8 M urea, then alkylated, reduced, and digested to peptides with trypsin.Tryptic peptides are analyzed using MudPIT, consisting of online nLC/LC-MS/MS.Peptides are fractionated by charge with strong cation exchange (SCX) chromatography and then further separated by hydrophobicity with reverse-phase high performance liquid chromatography (RP-HPLC).Separated peptides are transferred into the mass spectrometer by electrospray ionization (ESI) where their intact and fragment ion masses can be measured with either high or low mass accuracy and resolution using the LTQ or the Orbitrap, respectively.Bioinformatic analysis is performed on the MS/MS data.A SEQUEST protein database search is used to match experimental and theoretical peptide spectra and to localize PTMs.Spectra can be manually annotated to validate peptide identification or PTM localization.In this case, the blue and green annotations are sequence informative ions while the red ions are phosphorylation site-specific ions that localize the PTMs.DTASelect is used to filter confident peptide matches and reassemble the peptides into proteins.Further analysis can be performed between replicate and comparative runs with Venn diagrams or more sophisticated statistical software (see Figure5).

Figure 3 :
Figure 3: The lifecycle of a protein determines protein interactions.A: Protein synthesized by ribosome; B: Protein modified and trafficked; C: Protein performing a function; D: Protein targeted for degradation.An epitope can be purified from the time it is made (A) to the time it is degraded (D).Although the researcher may only be interested in functional binding partners and complexes (C), other "housekeeping" proteins and complexes (A, B, D) will also be purified and identified.

Figure 5 :
Figure 5: Computational comparison of CoIP results for identification of candidate binding partners for biological follow-up.Venn diagrams of identified proteins from triplicate analysis of N2 and negative control CoIPs considering proteins identified in (A) only one replicate, (B) all three replicates, and (C) with statistical significance (p < 0.05) based on spectral counts and replicate frequency.*Denotes the location of the POI in the Venn diagram.Colored numbers indicate either the number of known binding partners (blue) or potential binding partners (green).(D) Comparison of the abundance fold-change in proteins based on spectral count ratio and statistical significance (p < 0.05) of the fold-change by dividing N2 protein spectral counts by negative control protein spectral counts.A value of 1 is used for proteins not present in either sample with no spectral counts.Proteins with the highest positive fold-change are the POI (magenta), known binding partners (blue), and potential binding partners (green) above background binding partners (black).(E) Comparison of identified protein confidence values from CompPASS.Protein confidence scores from negative control runs are subtracted from N2 protein confidence scores.The POI (magenta) and known binding partners (blue) cluster in the upper right portion of the graph to reveal potential binding partners (green) over background binding partners (black).

Table 1 : List of common MS-identifiable PTMs.
As methods have improved, mostly though advances in mass spectrometer speed and sensitivity, a greater number of putative protein interactions are found.Thus, these larger, more comprehensive data sets have become less straightforward to interpret manually.Bioinformatic methods have been introduced to provide confidence values to CoIPed proteins from these larger data sets(Krogan et al., 2006; Sowa et al., 2009; Choi et al., 2011; Jager et al., 2012).These analyses often allow for differentiation of protein complex components from background proteins.We will illustrate how one such program, the Comparative Proteomic Analysis Software Suite (CompPASS) (Sowa et al., 2009), can be used on nematode CoIP-MS data to find validated protein interactions and new candidate binding partners.Collectively, the combination of cross-disciplinary methodologies can elucidate different types of biologically-relevant protein complex interactions from C. elegans.

Table 2 : List of identified proteins, spectral counts, and sequence coverages from GFP and ALP-1::GFP CoIPs-MS runs as the POI # 1 case study.
Error shown as standard deviations of spectral counts and sequence coverages from duplicate runs.

Table 3 : List of identified peptides, proteins, and modified peptides from replicate runs of the same sample with low and high resolution mass spectrometers.
Database searches were controlled to the same peptide FDR (0.1%) with different peptide per protein requirements.