Worm Breeder's Gazette 11(2): 77
These abstracts should not be cited in bibliographies. Material contained herein should be treated as personal communication and should be cited as such only with the consent of the author.
The C. elegans DNA sequence database is growing large enough to permit preliminary compilations and predictions which should be useful for analyzing new sequences. We extracted the DNA sequence flanking the ATG initiation codons from 54 C. elegans genes (34 were available in GenBank (Release 61.0)); after alignment they formed the matrix shown below, which yielded the consensus (A/c)A(a/c)(A/C)ATG (lower case implies weakly conserved). While this is obviously a small sample size the result is significantly different from the vertebrate consensus derived by Kozak(1) (N=699). The total information content in these two matrices(2, 3) differs by 1 bit (C. elegans ~7.6 bits ( genome=36% G+C), vertebrates ~8.6 bits (genome=40% G+C)), although the distributions are roughly homologous (see graph below; baseline ~0.06 bits). Because random sequences will contain many spurious homologies this consensus will be most useful for evaluating suspected initiation codons. The following sequences were used to generate the C. elegans matrix: act-1, act-3, ama-1, cal-1, col-2, col-7, col-14, deb-1, pd-2, gpd-3, gpd-4, glp-1, gyt-1, his-1, his-3, his-9, his-10, hts-11, hts-12, hsp-1, hsp-6, hsp16-1, hsp16-2, hsp16-41, hsp16-48, mec-3, lin-12, msp-74, myo-1, myo-3, unc-54, vit-2, vit-5, al msp sequences were not incorporated. If you are interested in increasing the utility of this 'ribosome binding site' (RBS) consensus please send us your sequences and we will incorporate them into the matrix and redistribute the results. [See Figure 1]