Medical Care |

Medical Care





Gene 345 (2005) 127 – 138 Codon bias as a factor in regulating expression via translation rate in the human genome Yizhar Lavner, Daniel Kotlar* Department of Computer Science, Tel Hai Academic College, Upper Galilee 12210, Israel Received 13 September 2004; received in revised form 10 November 2004; accepted 11 November 2004 Available online 24 December 2004 Received by H.E. Roman We study the interrelations between tRNA gene copy numbers, gene expression levels and measures of codon bias in the human genome.
First, we show that isoaccepting tRNA gene copy numbers correlate positively with expression-weighted frequencies of amino acids andcodons. Using expression data of more than 14,000 human genes, we show a weak positive correlation between gene expression level andfrequency of optimal codons (codons with highest tRNA gene copy number). Interestingly, contrary to non-mammalian eukaryotes, codon biastends to be high in both highly expressed genes and lowly expressed genes. We suggest that selection may act on codon bias, not only to increaseelongation rate by favoring optimal codons in highly expressed genes, but also to reduce elongation rate by favoring non-optimal codons inlowly expressed genes. We also show that the frequency of optimal codons is in positive correlation with estimates of protein biosynthetic cost,and suggest another possible action of selection on codon bias: preference of optimal codons as production cost rises, to reduce the rate of aminoacid misincorporation. In the analyses of this work, we introduce a new measure of frequency of optimal codons (FOPV), which is unaffected byamino acid composition and is corrected for background nucleotide content; we also introduce a new method for computing expected codonfrequencies, based on the dinucleotide composition of the introns and the non-coding regions surrounding a gene.
D 2004 Elsevier B.V. All rights reserved.
Keywords: Homo sapiens; Codon bias; Gene expression; Translation efficiency; Optimal codon; Biosynthetic cost Hey and Kliman, 2002; Versteeg et al., 2003), or with otherregularities in the genetic code (In Codon bias, the unequal use of synonymous codons for different species, codon bias was found to be in weak encoding amino acids ( correlation with gene expression level ( 2003), has been found in many organisms, both prokaryotes et al., 1986; Duret and Mouchiroud, 1999; Urrutia and Hurst, and eukaryotes. This bias varies considerably among 2003). Two main processes were proposed to explain codon organisms and even within the genes of the same organism.
bias: natural selection acting on silent changes in DNA, The bias was found to be in relation with many genomic mutational bias, or both. In unicellular organisms, such as E.
factors, such as gene length, GC-content, recombination rate, coli and S. cerevisiae, it was found that the codons translated gene expression level, and density of genes ( by the most abundant tRNA are the most frequently used Mouchiroud, 1999; Kreitman and Comeron, 1999; Duret, (In multicellular organisms, such as 2000; Marais et al., 2001; Urrutia and Hurst, 2001, 2003; C. elegans (and Drosophila (Moriyama and Powel, 1997), similar findings were found,namely, that codon bias favoring codons with high tRNA Abbreviations: CB, codon bias; ENC, effective number of codons; FOP, gene copy number rises with expression level, thus support- frequency of optimal codons; MCB, maximum likelihood codon bias.
ing the action of selection on codon bias to improve * Corresponding author. Tel.: +972 4 6952965; fax: +972 4 6952899.
E-mail address: [email protected] (D. Kotlar).
translation efficiency. This idea has not been confirmed in 0378-1119/$ - see front matter D 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.gene.2004.11.035 Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138 mammals (Although a weak correlation 2.2. Estimating translation efficiency between gene expression level and codon bias has beenobserved in the human genome ( 2.2.1. Gene copy numbers data this relation has not been linked to tRNA abundance.
Gene copy number data was taken from Recently, showed that in the human (2001) and from the tRNA-scan site ( genome, in the majority of amino acids with degeneracy edu/GtRDB/Hs/Hs-summary.html). In these data, pseudo- greater than one, the codons with the most abundant tRNA genes have already been removed. We use tRNA gene copy gene copy numbers, also exhibit an increase in frequency in numbers as an assumed estimate of cellular tRNA abundance highly expressed genes compared to lowly expressed genes.
(see explanation for this at the beginning of the Results In this study, we introduce new methods for computing the frequency of optimal codons (FOP) and for correctingcodon bias for background nucleotide content. Using these 2.2.2. Frequency of optimal codons (FOP) methods, we show evidence indicating that the human The optimal codon of an amino acid is defined here as genome translation efficiency, as estimated using tRNA the codon with the highest number of tRNA genes for its gene copy numbers, is in weak positive correlation with anticodon, among its synonymous codons. The simplest expression level, and that codon bias has a role in this way to compute the frequency of optimal codons (FOP) relation, although not the simple role it has in the model of a gene is to count the number of appearances of described above: on the one hand, we found that codon bias optimal codons in the gene, and divide it by the total favors codons with high tRNA gene copy number in highly number of codons in the gene (excluding the stop expressed genes, and on the other hand, based on the evidence presented here, we suggest that codon bias may act as a gene expression regulator by favoring codons with low tRNA gene copy numbers in lowly expressed genes. This supports a mechanism proposed by where ni( g) is the count of the codon i in the gene g, N (1979) and supported by for is the total number of codons in g, and the sum is taken rare codons in regulatory genes of E. coli. over all the optimal codons. The subscript s stands for (1991) also proposed this regulatory mechanism for several bsimpleQ. This FOP measure is affected by amino acid organisms, including primates. In addition, we present usage. If synonymous codon usage is random, a gene evidence that selection might act on codon bias to prefer composed only of amino acids of degeneracy two would optimal codons, possibly to reduce the rate of amino acid have FOP of 0.5, whereas a gene composed of amino misincorporation as protein production cost rises.
acids of degeneracy four would have FOP of 0.25. Inorder to obtain a measure which is independent of aminoacid composition, we multiply each codon count in Eq.
2. Materials and methods (2) by the corresponding amino acid degeneracy: 2.1. Frequency weighted by expression The count ca of each amino acid a is calculated as Here, syn(i) is the degeneracy of the amino acid coded by i. This way a gene with close to random synonymous codon usage will have FOP value close to 1, regardless of its amino acid composition. To see that this is a sensiblemeasure, we write Eq. (3) in a slightly different way: where ca( g) is the count of a in the gene g, E( g) is theexpression level of g (average of expression; see below), X naaðiÞðgÞ niðgÞ=naaðiÞðgÞ and the sum is taken over all the relevant genes (either the highly expressed genes or all expressed genes). The expression-weighted frequency f ex of the amino acid a is aa(i)( g) is the count of the amino acid coded by i in g. Assigning fi( g)=ni( g)/naa(i)( g) and faa(i)( g)=naa(i)( g)/ where the sum in the denominator is over all the amino acids. This calculation is similar to the one performed by Now, the second multiplier is just the relative synon- for C. elegans. In a similar manner, we ymous codon usage, or RSCU, of the codon i in the gene g compute the expression-weighted frequency of a codon.
(Hence, the FOP measure is a weighted Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138 mean of the RSCUs of the optimal codons, where the [Our definition of Breadth is different from the usual weights are the corresponding amino acid frequencies: definition (which is the number of tissues in which a gene is expressed. However, calculating ð ÞðgÞRSCUiðgÞ breadth in both ways yielded two sets of values which are This measure does not take into account the background highly correlated (RN0.96). As for Average expression, it nucleotide composition. In order to correct FOP for back- may seem more accurate to average first among the libraries ground nucleotide content, we replace 1/syn(i) in Eq. (5) by of each tissue and then to average over the tissues, as in (we observed that the values i ( g), the expected proportion of the codon i among its synonyms, based on the non-coding region surrounding the computed this way and those computes by simply averaging gene (see below for the way to compute Enc over all available libraries are highly correlated (RN0.98)].
iV( g) = fi /Ei ( g) in Eq. (6), we get 2.4. Codon bias corrected for background nucleotide Replacing syn(i) in Eq. (3) with 1/Enc i , we get a simpler We used four methods to compute codon bias, corrected way to compute FOPV( g): for background nucleotide content: Effective number of codons corrected for background where the sum is taken over all optimal codons for which nucleotide content or ENCV ( B measure (which is applied as 2.3. Computation of gene expression levels ð Þ f ð Þ  EncðgÞ Expression levels for individual genes were taken from SAGE (versionof July 21, 2003). Only tags that matched a named gene where fi( g) is the proportion of the codon i in the gene were taken into account. Expression values were calculated g among its synonymous codons; Ei ( g) is the expected by counting the tags for each gene in each library, proportion of i in g (see below); faa(i)( g) is the normalized per 200,000 tags, and combined over 43 frequency of the amino acid coded by i in g; and the libraries representing 18 normal tissues: brain (7 libraries, sum is over all codons.
311,726 tags), breast (7 libraries, 310,477 tags), colon (2 HK measure: computing the uncorrected codon bias libraries, 76,954 tags), heart (1 library, 71,926 tags), kidney (1 library, 30,721 tags), liver (1 library, 58,467 tags), lung (1 ð Þ f ð Þ  1=synðiÞ library, 77,024 tags), muscle (2 libraries, 88,332 tags), ovary (2 libraries, 81,270 tags), pancreas (2 libraries, 54,673 tags),peritoneum (1 library, 53,527 tags), placenta (2 libraries, where syn(i) is the degeneracy of the amino acid coded 207,348 tags), prostate (4 libraries, 232,573 tags), retina (4 by i (the number of synonymous codons for i). Then libraries, 239,211 tags), spinal cord (1 library, 45,109 tags), computing the regression line of CB( g) versus non- stomach (1 library, 18,193 tags), vascular (2 libraries, coding GC-content, from the non-coding regions 91,131 tags), white blood cells (2 libraries, 67,177 tags).
surrounding g, and subtracting the regression line from We combined the expression levels in the libraries in the CB measure (as done by three ways: (a) breadth of expression, defined here as the thus we shall denote this method HK). This is based on number of libraries in which the gene was expressed; (b) the known observation that codon bias is positively average over the libraries; and (c) maximum over the correlated with both non-coding GC-content and libraries. The correlation values among the three methods expression level in some eukaryotic genomes, including are listed in . All correlations are highly significant the human genome (see below in the next subsection).
Maximum-likelihood codon bias, or MCB ( b10100. Average is the method that correlates the best with the two other methods.
Hurst, 2001).
2.5. Computing expected values Correlation coefficients between different methods of combining values inSAGE libraries For the first three of the four methods described above, we need non-coding sequences neighboring a given gene (the fourth method, MCB, uses the coding sequence itself).
We used the sequence consisting of the introns of the gene, the 1000 nucleotides immediately preceding the coding area Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138 of the gene, and similarly, those 1000 nucleotides immedi- acid and fourfold one). It is clear that the frequency of a ately succeeding it (or truncated, as necessary, in the case codon XYZ among its synonyms in a gene could be affected that genes were less than 1000 nucleotides apart; see also by the background frequencies of both the YZ dinucleotide and the ZN dinucleotide, where N is any of the four bases.
intron is longer than 2000 bp, only the 1000 nucleotides on Calculating the expected relative frequencies of XYZ at the each of the intron's ends were taken. By taking 1000 non-coding surrounding is done as follows: flanking bases, we assure that regions that may be under selective constrains, both in flanking regions and introns, constitute only a small portion of the strands that are used as control. On the other hand, regions of large introns that are far from any coding sequence may not represent themutational bias that acts on the nearby exons, and thus introns were truncated to 1000 bases on each end. We masked repetitive elements using RepeatMasker ( In warm-blooded vertebrates, including human, it is well known that disochoresT structure ( Where aa(XYZ) is the amino acid coded by XYZ and correlates with both expression level and codon bias aa(XYZ) is the number of triplets that code for aa(XYZ), that are followed by the base N, divided by the total number of correction of codon bias for the influence of isochore triplets that code for aa(XYZ). All triplets are counted in the structure is essential, it is not enough. It has been observed non-coding surrounding of g in all three reading frames.
that the G and C compositions (and similarly the A and T We calculated FOPV and B using both methods. The B compositions), in both coding and non-coding sequences, in (Eq. (9)) values for all genes in the study, with Enc the human genome, are not equal, and the differences calculated in both methods are highly correlated (R=0.96), (termed GC-skew and AT-skew, respectively) correlate with and similarly for FOPV in Eqs. (7) and (8) (R=0.93). Thus, expression level (Thus, the correction of codon the results involving these measures, with Enc i ( g) calculated bias should consider the base composition of the background in both ways, are almost identical. In this paper, we included in a more differential manner. Here we introduce two only the results where Enc i ( g) was computed in the second methods for computing Enc method which seems more accurate, as it considers i ( g) (see Eqs. (7)–(9)).
We treated the amino acids with six synonymous codons as two independent amino acids each, one with four codons,and one with two (as in so that any 2.6. Protein biosynthetic cost measures two synonymous codons differ only in the third codonposition.
We used the size/complexity score of for The first and simpler method applies the base proportions amino acids To evaluate the biosynthetic cost of a in the non-coding surrounding of the gene to the third codon protein, encoded by a given gene, we used two measures: position. For example, if the base A appears 21% of thetimes in the non-coding surrounding of a given gene, then in 2.6.1. Average size/complexity score a fourfold amino acid, the codon ending with an A will have Each codon was given the score of the amino acid it encodes. The size complexity score was averaged over the i ( g) value of 0.21.
However, since it is known that there are dinucleotides codons of a gene.
that are in excess or are avoided in the genome, for example,the dinucleotide CG is depleted by its tendency to mutate 2.6.2. Frequency of expensive amino acids and to disappear from genomic sequences ( This is the relative frequency of amino acids with a size/ 2000), it may not be enough to consider single nucleotide complexity score greater than 40 (Arg, Cys, His, Phe, and frequencies for correction. The second method incorporates Tyr). We excluded the single-codon amino acids Met and the dinucleotide composition of the non-coding surrounding Trp, since they do not contribute to the FOPV or to the codon of the gene in the following manner: We count the number of appearances of each triplet in all three reading frames of the non-coding surrounding of a 2.7. Sequence data gene. For each triplet XYZ, we denote this number by #XYZ.
For a codon XYZ, we denote by S(XYZ) the set of bases that Gene and intron sequences were downloaded from when replacing Z would yield a synonymous codon NCBI GenBank, Build 33 ( (including the base Z itself). For example, S(GCA)= H_sapiens/). We included only CDSs that start with a start {A,C,G,T}, S(AGC)={C,T}, and S(ATT)={A,C,T} (recall codon, end with a stop codon, have a length that is a that the sixfold amino acids are split into a twofold amino multiple of three, and have no unidentified bases. For genes Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138 Table 2Amino acids in the human genome: Frequency, by expression in highly expressed genes and in all expressed genes, Isoaccepting tRNAgene copy numbers, and size/complexity score (as in Frequency weighted by expression Frequency weighted by expression Isoaccepting tRNA in highly expressed genes in all expressed genes with more than one CDS, we took the longest CDS. Thus, frequencies (excluding two outliers, we obtained a between 5% and 10% of the genes were excluded. In highly significant positive correlation (R=0.654, pb0.001, addition, less than half the genes were expressed in the 43 and R=0.56, pb0.001 when including the outliers). Similar SAGE libraries (see below). After a further removal of results were obtained when all expressed genes were taken, genes, required by the computation of MCB ( and slightly lower correlation were obtained when frequen- Hurst, 2001), 14,131 genes were left for the analysis.
cies were not weighted (R=0.565, p=0.009 for amino acids;R=0.62, pb0.001 for codons). These correlations do notnecessarily prove a correlation between cellular tRNA abundance and the number of tRNA gene, but, as indicatedby if such relation were not present, we would In some unicellular organisms, a positive relation not expect any correlation between the frequencies of amino between cellular tRNA and tRNA gene copy number was acids and codons and the number of tRNA genes.
found (et al., 1999). As in C. elegans and other eukaryotes 3.1. The relation between gene copy numbers and gene 2001), in the human genome there is also a redundancy inthe set of tRNA genes (see also Under the assumption that gene copy numbers can be used as an indication for cellular tRNA abundance (see also number of tRNA genes varies from 7 (Trp) to 44 (Val).
we used frequency of optimal Although a variation between different tRNA genes could codons (FOPV, see Materials and methods) as a measure of be in the transcription level, a positive correlation between translation efficiency. As explained above, this measure is intracellular tRNA and tRNA gene copy number is expected independent of the amino acid frequencies and is corrected (We assume such correlation for background nucleotide content. Here, the term optimal and use the number of tRNA genes as a measure for the codon denotes the codon with the highest tRNA gene copy amount of intracellular tRNAs. As done by for number, for each amino acid (also termed major codon and C. elegans, we measured the correlation between the translationally superior codon; see We isoaccepting tRNA genes and the expression-weighted calculated the correlation between FOPV and expression frequencies of the 20 amino acids, and those of the level in over 14,000 genes in the all human chromosomes.
individual codons (see and Materials and We combined the expression levels in 43 libraries in three methods). When amino acid frequencies were considered ways: (a) breadth of expression; (b) average over the (a significant correlation was found for highly libraries; and (c) maximum over the libraries (see Materials expressed genes (R=0.585, p=0.007, N=4320). For codon and methods). The correlations are very weak (R=0.075, Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138 Table 3Codons in the human genome: tRNA gene copy numbers and frequencies in highly expressed genes and in all expressed genes Isoaccepting tRNA Frequency weighted by expression in highly expressed genes Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138 Fig. 1. (a) Isoaccepting tRNA gene copy number vs. frequency of amino acids and codons, weighted by expression, in highly expressed genes, in the humangenome. Frequencies were computed and weighted for the top 30% of all expressed genes (A total of 4320 genes). (a) Amino acid frequencies, R=0.585,p=0.007. (b) Codons frequencies, R=0.654, pb0.001. Codons translated by the same anticodon (the wobble effect) were regarded as one point. The data appearin Expression values were calculated by averaging over 43 SAGE libraries (see Materials and methods).
0.08, and 0.06, respectively) but highly significant bias to rise with frequency of optimal codons. Interestingly, ( pb0.0001). The error-bar graphs in illustrate the we found that genes with the lowest values of FOPV also relation between the FOPV and the average expression (see tends to have high codon bias, even higher than in genes Materials and methods). Although the correlations are weak, with the highest value of FOPV. This is illustrated in the the graph indicates that the FOPV values rise as expression graphs in The high values of codon bias in genes with level rises.
low frequency of optimal codons indicate that the bias tendsto be for codons that correspond to low tRNA gene copy 3.2. The role of codon bias numbers, and this in turn may suggest that the bias is forcodons with low translational values. The figure shows the Since codon bias is the unequal use of synonymous relation between FOPV and four measures of codon bias: codons, we expect high codon bias in genes with stronger Effective number of codons, or ENCV (B preference for optimal codons, and thus we expect codon measure (HK measure (2002), and maximum-likelihood codon bias, or MCB(ENCV and B are corrected forbackground nucleotide composition by considering dinu-cleotides in the background (see Materials and methods).
When considering the relation between codon bias and gene expression level, we encounter another unexpectedresult. Instead of the expected positive correlation betweencodon bias and expression level, based on studies indifferent organisms (see the Introduction), we observed thatthe average codon bias is highest both in the classes ofgenes with the highest as well as with the lowest expressionlevels. This is shown in the graphs of Similar graphswere obtained when using breadth of expression ormaximum expression (see Materials and methods, graphsavailable upon request).
As indicated above, a rise in codon bias as expression level drops has not been observed in lower organisms. As itappears, in addition to the role associated with codon bias inenhancing the expression of certain genes by preferring Fig. 2. Frequency of optimal codons (FOP V) vs. average expression. 14,131 codons with high cellular levels of isoaccepting tRNA (as genes expressed in 43 SAGE libraries, were divided, according to expressionlevel, into ten categories of approximately equal size. Circles represent the was found for unicellular organism; mean value. Error bars show 95% of confidence. R=0.08, pb0.0001.
et al., 1999) or with high tRNA gene copy number ( Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138 Fig. 3. Codon bias vs. frequency of optimal codons (FOPV). (a) Effective number of codons (ENCV); (b) B; (c) Regression line subtraction (HK, see Materialsand methods); (d) Maximum-likelihood codon bias (MCB). A total of 14,131 genes was divided, according to FOPV, into ten categories of approximately equalsize. Circles represent the mean value. Error bars show 95% of confidence.
2000), here we found that in the human genome, codon bias bexpensiveQ amino acids, as estimated by the size/complex- may have an additional role: controlling the expression of ity measure. Similar graphs were obtained when using certain genes by preferring codons with small tRNA gene breadth of expression or maximum expression.
copy number.
Interestingly, we found that the frequency of optimal codons is in significant positive correlation with the 3.3. Translation efficiency and amino acid biosynthetic cost measures of biosynthetic cost (R=0.18, pb0.0001), asindicated in Translation efficiency is also affected by the size, the Although this result seems to be counterintuitive, since it structure, and the production cost of the amino acids was shown above that for highly expressed genes, the size/ incorporated in the protein. To further analyze these factors, complexity score of genes tends to decrease with the we hypothesized that proteins coded by highly expressed average expression, and thus showing a possible mechanism genes are composed of smaller and biosynthetically cheaper of preference for cheap and smaller amino acids in highly amino acids.
expressed genes, the correlations between the FOP and The relation between expression levels and biosynthetic expression level and between expression level and size/ cost is shown in We use the size/complexity quotient complexity score are too weak to infer from about the of as a measure of biosynthetic cost ( relation between the FOP and size/complexity score.
2). A clear monotonic relation between expression levels The graph indicates that genes that encode for more and biosynthetic cost is evident. Genes with higher average expensive amino acids tend to have more tRNA genes for expression (see Materials and methods) tend to code less for their anticodons. This suggests the possibility of a mecha- Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138 Fig. 4. Codon bias vs. Average expression level. (a) Effective number of codons (ENCV); (b) B; (c) Regression line subtraction (HK, see Materials andmethods); (d) Maximum-likelihood codon bias (MCB). A total of 14,131 genes was divided, according to expression level, into ten categories of approximatelyequal size. Circles represent the mean value. Error bars show 95% of confidence.
Fig. 5. Biosynthetic cost vs. average expression. (a) Average size/complexity score; (b) Frequency of occurrence of the most expensive amino acids (seeMaterials and methods). A total of 14,131 genes expressed in 43 SAGE libraries was divided, according to expression level, into ten categories ofapproximately equal size. Circles represent mean values. Error bars show 95% of confidence. Correlation coefficients are (a) R=0.071 ( pb0.0001) (b)R=0.127 ( pb0.0001).
Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138 codon bias used in this study. Previous analysis of the humangenome (did not show high codonbias for the weakly expressed genes. This may be explainedby the fact that in that study an older version of the genomewas used and that a large part of the weakly expressed geneswas deliberately excluded from the analysis.
In highly expressed genes, we show a tendency for high codon bias, and also for higher frequency of optimalcodons. A possible explanation for this is that in thesegenes, the high codon bias is a consequence of more codonswith high tRNA gene copy number, increasing the trans-lation elongation rate. The finding that the average codonbias is high for lowly expressed genes, together with theresult that these genes tend to have low frequency of optimalcodons (and therefore their high bias is probably the Fig. 6. Frequency of optimal codons vs. Average size/complexity score.
consequence of favoring non-optimal codons in terms of R=0.18, pb0.0001. A total of 14,131 genes expressed in 43 SAGE libraries translation efficiency), suggest that some lowly expressed was divided according to average size/complexity into ten categories of genes may also experience the effect of natural selection approximately equal size. Circles represent the mean values. Error bars against optimal codons. showed strong show 95% of confidence.
evidence for a transcription-associated bias for higher G andT content on the coding strand of introns. This bias is nism that compensates high cost of production of a protein observed in coding regions in the third codon position as by codon bias favoring optimal codons, enabling faster and well (Computing codon bias and RSCU more accurate translation.
values relative to expected values, derived from introns onthe coding strand only, as done in this study, avoids thedifference between the coding and the non-coding strands, as observed in Since only half of theoptimal codons end with a G or a T, it is unlikely that the In this work, we studied the relation between translation transcription associated bias for G and T in both introns and efficiency (as estimated by the number of tRNA genes) and coding regions can accounts for the correlation between gene expression level (as estimated by measures derived FOP and expression level (and the relation between from SAGE/EST data) in the human genome, and the codon bias and expression level ( interrelations between these two factors and codon bias. For We hypothesize that the translation efficiency of proteins, this purpose, we introduced two methods: a method for which can be a disadvantage in high levels, is controlled by computing the frequency of optimal codons, which is this mechanism of preferring codons with low tRNA independent of amino acid composition and with correction abundance, and thus regulating the elongation rate in these for background nucleotide content, and a method for proteins. Such a mechanism was suggested by computing expected values of codon frequency, based on Grosjean (1979), and supporting evidence was provided by dinucleotide composition of the background. We showed The latter found that some that amino acid and codon frequencies, weighted by E. coli regulatory genes contains an unusually high number expression, correlate positively with tRNA gene copy of codons that are not frequently used in most E. coli genes, number, thus possibly indicating a relation between the and therefore suggested that this could be part of a number of tRNA genes and tRNA abundance. We showed mechanism that helps to keep a low expression level in that expression level is in weak, highly significant, positive some regulatory genes. In another study, correlation with frequency of optimal codons (which is (1991), who showed that in E. coli, S. cerevisiae, D.
assumed to be a measure of translation efficiency). This, in melanogaster, and primates (mainly Homo sapiens) pro- turn, shows that codon choice, or codon bias, relates to teins containing a high percentage of low-usage codons can expression level. A caveat must be admitted here, since we be characterized as cases where an excess of the protein used measures of the transcriptome, indicating numbers of could be detrimental. Another indication of this mechanism mRNAs, and not of proteins, but due to the lack of data on in bacteria is provided by who showed protein levels in human, we assume that the former can be evidence suggesting that the translation of proteins involved induced by the mRNA levels as used here. In addition, we in various specialized functions may be regulated by using obtained a surprising result not observed in previously rare codons. and studied organisms, namely, that the average codon bias is presented experimental evidence which also indicates that high both in weakly expressed genes and in highly expressed the presence of non-optimal codons can reduce translation genes. This result was obtained with all four measures of efficiency. Support to the notion of an expression regulation Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138 mechanism in lowly expressed genes in mammals can be gene expression, to asses its association with amino acid also found in the work of and usage, is average expression or breadth of expression, (see (1997), who showed that modifying the codon composition Materials and methods). These measures reflect more of non-mammalian genes to resemble that of mammalian accurately the total activity of genes in different tissues.
genes can significantly enhance their translation in mamma- We showed that frequency of optimal codons correlates lian cells, where the translation of the original genes is positively with protein production cost. We suggest that this limited. This model was challenged by arguing that the may be an indication of the action of selection on codon bias presence of rare codons in lowly expressed genes is due to to reduce error rate in the production of costly proteins. This mutational drift randomizing codon usage ( mechanism was suggested by who showed 1986). However, it is hard to see how random drift can evidence that natural selection acts on synonymous codon account for the high level of codon bias, favoring rare and usage to enhance the accuracy of protein synthesis in non-optimal codons, in lowly expressed genes, as observed Drosophila, based on association between synonymous in the human genome, in this study. More research is needed codon usage and amino acid constraint. In the study of to show whether this mechanism is used to regulate a negative correlation between translation rate. Since the former hypothesis leads to an metabolic costs of amino acids and codon bias was shown.
experimental prediction, namely, that proteins that could be This relation seems to contradict the result presented here.
of disadvantage in excess, indeed contain significantly more However, apart from the fact that their study deals with non-optimal codons, it is of high value.
bacteria, and that both selective forces and regulation Using the size/complexity index developed by mechanism may be different in higher organisms, there (1997, see Materials and methods) as an estimate of the are other factors that may explain the difference: as was amino acid cost, we found that there is a negative correlation mentioned above, the metabolic cost calculation does not (R=0.071) between expression level and the size/complex- take into account the size and the complexity of amino ity score. This result is expected if amino acid usage is acids, and besides, in other organisms other factors such as shaped by selective forces to optimize translation efficiency.
biosynthetic pathways and dietary conditions may contrib- A more pronounced effect was demonstrated when the ute differently to the amino acid composition. In addition, correlation between the frequency of expensive amino acids the MCU measure in that study is different from the FOPV within genes and the expression level is considered measure as was defined here, and also the former is not (R=0.127, see also used corrected for background composition and for amino acid Dufton's index to examine its association with gene usage composition, as in the FOPV.
expression level, and found a similar tendency to avoid Two points of caution should be emphasized here: First, the use of complex amino acid in highly expressed genes.
as done in previous studies ( Another study (conducted on we assumed a correspondence between tRNA gene copy B. subtilis and E. coli, and concentrated on the relation numbers and tRNA cellular abundance. As far as we know, between metabolic costs of amino acid biosynthesis and this relation, however proved for several organisms, has not patterns of amino acid composition, shows an increased been substantiated for humans. As noted above, the results of usage of less energetically costly amino acids in highly this study may suggest this relation, since without such expressed genes in both cells, and thus support the action of correspondence one could not expect the correlations, selection on amino acid usage to increase metabolic observed here, between gene copy numbers and amino acid efficiency. It should be noted however, that Dufton's index, and codon frequencies. The fact that, in 14 out of 18 amino although an indirect measure, takes into account the size and acids, the codons with the highest tRNA gene copy numbers the structural complexity of amino acids, factors that have also exhibit an increase in their frequency when comparing an influence on the rate of incorporating amino acids in the between lowly and highly expressed genes ( elongation process.
support this assumption. However, this is not a proof of such Contrary to these results, using micro- a relation. Thus, some of the results suggested in the above array data, reports that there is no detectable influence of discussion, concerning the action of selection on codon bias expression on amino acid usage. One reason to this for translation efficiency, rely partly on an assumption, that, discrepancy could be the utilization of two different although reasonable, has not been firmly substantiated yet.
methods for the estimation of the expression level. Another The second point is that the correlations between FOPV possible explanation could be the fact that Comeron and the different expression level measures are very weak.
analyzed the expression of each tissue separately FOPV was calculated by controlling the effect of amino acid (dexpression levelT), which can be problematic. Genes composition and that of background nucleotide content. It is which are highly expressed in one tissue can be of poor not unconceivable that another third effect may account for expression in another, or not expressed at all. Since it is the observed correlations.
more reasonable to assume that if there is a selection for In summary, based on the evidence presented, we suggest translational efficiency, it will be detected in genes with high three possible ways in which selection may act on codon activity in many tissues, the more appropriate measure for bias in the human genome: (1) Increasing translation Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138 efficiency in highly expressed genes; (2) regulating trans- Bacillus subtilis tRNAs: gene expression level and species-specific lation efficiency of some proteins that can be a disadvantage diversity of codon usage based on multivariate analysis. Gene 238,143 – 155.
at high levels; and (3) improving translation efficiency and Kanaya, S., Yamada, Y., Kinouchi, M., Kudo, Y., Ikemura, T., 2001. Codon reducing the rate of amino acid misincorporation in the usage and tRNA genes in eukaryotes: correlation of codon usage production of biosynthetically expensive proteins.
diversity with translation efficiency and with CG-dinucleotide usage asassessed by multivariate analysis. J. Mol. Evol. 53, 290 – 298.
Karlin, S., Mrazek, J., 1996. What drives codon choices in human genes? J. Mol. Biol. 262, 459 – 472.
Karlin, S., Mrazek, J., Campbell, A.M., 1998. Codon usages in different gene classes of the Escherichia coli genome. Mol. Microbiol. 29 (6), We thank Edward Trifonov, Laurent Duret and Giuseppe 1341 – 1355.
D'Onofrio for valuable discussions that contributed to this Knight, R.D., Freeland, S.J., Landweber, L.F., 2001. A simple model based manuscript. We also thank Yefim Yakir for preparing part of on mutation and selection explains trends in codon and amino-acid the figures and for technical support and Nurit Carmi for usage and GC composition within and across genomes. Genome Biol. 2 Konigsberg, W., Godson, N., 1983. Evidence for use of rare codons in the dnaG gene and other regulatory genes of Escherichia coli. Proc. Natl.
Acad. Sci. U. S. A. 80 (3), 687 – 691.
Kreitman, M., Comeron, J.M., 1999. Coding sequence evolution. Curr.
Opin. Genet. Dev. 9 (6), 637 – 641.
Akashi, H., 1994. Synonymous codon usage in Drosophila mela- Lander, E.S., et al. International Human Genome Sequencing Consortium, nogaster: natural selection and translational accuracy. Genetics 136 2001. Initial sequencing and analysis of the human genome. Nature (3), 927 – 935.
409, 860 – 921.
Akashi, H., 1995. Inferring weak selection from patterns of polymorphism Marais, G., Mouchiroud, D., Duret, L., 2001. Does recombination improve and divergence at bsilentQ sites in Drosophila DNA. Genetics 139, selection on codon usage? Lessons from nematode and fly complete 1076 – 1677.
genomes. Proc. Natl. Acad. Sci. U. S. A. 98 (10), 5688 – 5692.
Akashi, H., Gojobori, T., 2002. Metabolic efficiency and amino acid Moriyama, E.N., 2003. Codon usage, Encyclopedia of the human genome.
composition in the proteomes of Escherichia coli and Bacillus subtilis.
Macmillan Publishers, Nature Publishing Group. Proc. Natl. Acad. Sci. U. S. A. 99 (6), 3695 – 3700.
Moriyama, E.N., Powel, J.R., 1997. Codon usage bias and tRNA Bernardi, G., et al., 1985. The mosaic genome of warm-blooded vertebrates.
abundance in Drosophila. J. Mol. Evol. 45, 514 – 523.
Science 228 (4702), 953 – 958.
Novembre, J.A., 2002. Accounting for background nucleotide composition Comeron, J.M., 2004. Selective and mutational patterns associated with when measuring codon usage bias. Mol. Biol. Evol. 19 (8), 1390 – 1394.
gene expression in humans: influences on synonymous composition and Pedersen, S., 1984. Escherichia coli ribosomes translate in vivo with intron presence. Genetics 167, 1293 – 1304.
variable rate. EMBO J. 3 (12), 2895 – 2898.
Dong, H., Nilsson, L., Kurland, C.G., 1996. Co-variation of tRNA Percudani, R., Pavesi, A., Ottonello, S., 1997. Transfer RNA gene abundance and codon usage in Escherichia coli at different growth redundancy and translational selection in Saccharomyces cerevisiae.
rates. J. Mol. Biol. 260, 649 – 663.
J. Mol. Biol. 268, 322 – 330.
Dufton, M.J., 1997. Genetic code synonym quotas and amino acid Robinson, M., et al. , 1984. Codon usage can affect efficiency of translation complexity: cutting the cost of proteins? J. Theor. Biol. 187, 165 – 173.
of genes in Escherichia coli. Nucleic Acids Res. 12 (17), 6663 – 6671.
Duret, L., 2000. tRNA gene number and codon usage in the C. elegans Saier, M.J., 1995. Differential codon usage: a safe guard against genome are co-adapted for optimal translation of highly expressed inappropriate gene expression of specialized genes. FEBS 362, 1 – 4.
genes. Trends Genet. 16 (7), 287 – 289.
Sharp, P.M., Li, W.-H., 1986. Codon usage in regulatory genes in Duret, L., 2002. Evolution of synonymous codon usage in metazoans. Curr.
Escherichia coli does not reflect selection for rare codons. Nucleic Opin. Genet. Dev. 12, 640 – 649.
Acids Res. 14, 7737 – 7749.
Duret, L., Mouchiroud, D., 1999. Expression pattern and, surprisingly, gene Sharp, P.M., Tuohy, T.M., Mosurski, K.R., 1986. Codon usage in yeast: length shape codon usage in Caenorhabditis, Drosophila, and cluster analysis clearly differentiates highly and lowly expressed genes.
Arabidopsis. Proc. Natl. Acad. Sci. U. S. A. 96 (8), 4482 – 4487.
Nucleic Acids Res. 14, 5125 – 5143.
Fiers, W., Grosjean, H., 1979. On codon usage. Nature 277 (5694), 328.
Urrutia, A.O., Hurst, L.D., 2001. Codon usage bias covaries with Grantham, R., Gautier, C., Gouy, M., Mercier, R., Pave, A., 1980. Codon expression breadth and the rate of synonymous evolution in humans, catalog usage and the genome hypothesis. Nucleic Acids Res. (8), but this is not evidence for selection. Genetics 159, 1191 – 1199.
r49 – r62.
Urrutia, A.O., Hurst, L.D., 2003. The signature of selection mediated by Graur, D., Li, W.-H., 2000. Fundamentals of Molecular Evolution, 2nd ed.
expression on human genes. Genome Res. 13 (10), 2260 – 2264.
Mass, Sinauer, Sunderland.
Versteeg, R., et al. , 2003. The human transcriptome map reveals extremes Hey, J., Kliman, R.M., 2002. Interactions between natural selection, in gene density, intron length, GC content, and repeat pattern for recombination and gene density in the genes of Drosophila. Genetics domains of highly and weakly expressed genes. Genome Res. 13 (9), 160, 595 – 608.
1998 – 2004.
Ikemura, T., 1981. Correlation between the abundance of Escherichia coli Wells, K.D., Foster, J.A., Moore, K., Pursel, V.G., Wall, R.J., 1997. Codon transfer RNAs and the occurrence of the respective codons in its protein optimization, genetic insulation, and an rtTA reporter improve perform- genes. J. Mol. Biol. 146 (1), 1 – 21.
ance of the tetracycline switch. Transgenic Res. 8 (5), 371 – 381.
Ikemura, T., 1982. Correlation between the abundance of yeast transfer Wright, F., 1990. The deffective number of codonsT used in a gene. Gene 87, RNAs and the occurrence of the respective codons in protein genes.
Differences in synonymous codon choice patterns of yeast and Zhang, S., Zubay, G., Goldman, E., 1991. Low usage codons in Escherichia Escherichia coli with reference to the abundance of isoaccepting coli, yeast, fruit fly, and primates. Gene 105, 61 – 72.
transfer RNAs. J. Mol. Biol. 158 (4), 573 – 597.
Zhou, J., Liu, W-J., Peng, S.W., Sun, X.Y., Frazer, I., 1999. Papillomavirus Kanaya, S., Yamada, Y., Kudo, Y., Ikemura, T., 1999. Studies of codon capsid protein expression level depends on the match between codon usage and tRNA genes of 18 unicellular organisms and quantification of usage and tRNA availability. J. Virol. 73, 4972 – 4982.



BY SZILVIA HICKMAN NATURAL SKIN CARE to prevent damage to the body diminishes the skin's structural support and decreases ANTIOXIDANTS WORK before it actually happens. its elasticity, resilience and suppleness, leads to They protect and nurture cells naturally, combating inflammation, and is the source of liver spots and poor

Yili伊力 material

REDWOOD INDUSTRIAL CO., LTD. MATERIAL SAFETY DATA SHEET MSDS No: 002004.001 Issue Date: June 4,2002 1. PRODUCT AND COMPANY IDENTIFICATION Product Name: CARTRIDGES FOR POWER DEVICES Chemical Name: Mixture---Metal Alloy Synonyms: Rimfire Cartridge for power devices, 22, 25, 27 Carliber Power tool Round, Power Load, Powder Load, Booster