Doi:10.1016/j.gene.2004.11.035
Gene 345 (2005) 127 – 138
Codon bias as a factor in regulating expression via translation rate
in the human genome
Yizhar Lavner, Daniel Kotlar*
Department of Computer Science, Tel Hai Academic College, Upper Galilee 12210, Israel
Received 13 September 2004; received in revised form 10 November 2004; accepted 11 November 2004
Available online 24 December 2004
Received by H.E. Roman
We study the interrelations between tRNA gene copy numbers, gene expression levels and measures of codon bias in the human genome.
First, we show that isoaccepting tRNA gene copy numbers correlate positively with expression-weighted frequencies of amino acids andcodons. Using expression data of more than 14,000 human genes, we show a weak positive correlation between gene expression level andfrequency of optimal codons (codons with highest tRNA gene copy number). Interestingly, contrary to non-mammalian eukaryotes, codon biastends to be high in both highly expressed genes and lowly expressed genes. We suggest that selection may act on codon bias, not only to increaseelongation rate by favoring optimal codons in highly expressed genes, but also to reduce elongation rate by favoring non-optimal codons inlowly expressed genes. We also show that the frequency of optimal codons is in positive correlation with estimates of protein biosynthetic cost,and suggest another possible action of selection on codon bias: preference of optimal codons as production cost rises, to reduce the rate of aminoacid misincorporation. In the analyses of this work, we introduce a new measure of frequency of optimal codons (FOPV), which is unaffected byamino acid composition and is corrected for background nucleotide content; we also introduce a new method for computing expected codonfrequencies, based on the dinucleotide composition of the introns and the non-coding regions surrounding a gene.
D 2004 Elsevier B.V. All rights reserved.
Keywords: Homo sapiens; Codon bias; Gene expression; Translation efficiency; Optimal codon; Biosynthetic cost
Hey and Kliman, 2002; Versteeg et al., 2003), or with otherregularities in the genetic code (In
Codon bias, the unequal use of synonymous codons for
different species, codon bias was found to be in weak
encoding amino acids (
correlation with gene expression level (
2003), has been found in many organisms, both prokaryotes
et al., 1986; Duret and Mouchiroud, 1999; Urrutia and Hurst,
and eukaryotes. This bias varies considerably among
2003). Two main processes were proposed to explain codon
organisms and even within the genes of the same organism.
bias: natural selection acting on silent changes in DNA,
The bias was found to be in relation with many genomic
mutational bias, or both. In unicellular organisms, such as E.
factors, such as gene length, GC-content, recombination rate,
coli and S. cerevisiae, it was found that the codons translated
gene expression level, and density of genes (
by the most abundant tRNA are the most frequently used
Mouchiroud, 1999; Kreitman and Comeron, 1999; Duret,
(In multicellular organisms, such as
2000; Marais et al., 2001; Urrutia and Hurst, 2001, 2003;
C. elegans (and Drosophila (Moriyama and Powel, 1997), similar findings were found,namely, that codon bias favoring codons with high tRNA
Abbreviations: CB, codon bias; ENC, effective number of codons; FOP,
gene copy number rises with expression level, thus support-
frequency of optimal codons; MCB, maximum likelihood codon bias.
ing the action of selection on codon bias to improve
* Corresponding author. Tel.: +972 4 6952965; fax: +972 4 6952899.
E-mail address:
[email protected] (D. Kotlar).
translation efficiency. This idea has not been confirmed in
0378-1119/$ - see front matter D 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.gene.2004.11.035
Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138
mammals (Although a weak correlation
2.2. Estimating translation efficiency
between gene expression level and codon bias has beenobserved in the human genome (
2.2.1. Gene copy numbers data
this relation has not been linked to tRNA abundance.
Gene copy number data was taken from
Recently, showed that in the human
(2001) and from the tRNA-scan site (
genome, in the majority of amino acids with degeneracy
edu/GtRDB/Hs/Hs-summary.html). In these data, pseudo-
greater than one, the codons with the most abundant tRNA
genes have already been removed. We use tRNA gene copy
gene copy numbers, also exhibit an increase in frequency in
numbers as an assumed estimate of cellular tRNA abundance
highly expressed genes compared to lowly expressed genes.
(see explanation for this at the beginning of the Results
In this study, we introduce new methods for computing
the frequency of optimal codons (FOP) and for correctingcodon bias for background nucleotide content. Using these
2.2.2. Frequency of optimal codons (FOP)
methods, we show evidence indicating that the human
The optimal codon of an amino acid is defined here as
genome translation efficiency, as estimated using tRNA
the codon with the highest number of tRNA genes for its
gene copy numbers, is in weak positive correlation with
anticodon, among its synonymous codons. The simplest
expression level, and that codon bias has a role in this
way to compute the frequency of optimal codons (FOP)
relation, although not the simple role it has in the model
of a gene is to count the number of appearances of
described above: on the one hand, we found that codon bias
optimal codons in the gene, and divide it by the total
favors codons with high tRNA gene copy number in highly
number of codons in the gene (excluding the stop
expressed genes, and on the other hand, based on the
evidence presented here, we suggest that codon bias may act
as a gene expression regulator by favoring codons with low
tRNA gene copy numbers in lowly expressed genes. This
supports a mechanism proposed by
where ni( g) is the count of the codon i in the gene g, N
(1979) and supported by for
is the total number of codons in g, and the sum is taken
rare codons in regulatory genes of E. coli.
over all the optimal codons. The subscript s stands for
(1991) also proposed this regulatory mechanism for several
bsimpleQ. This FOP measure is affected by amino acid
organisms, including primates. In addition, we present
usage. If synonymous codon usage is random, a gene
evidence that selection might act on codon bias to prefer
composed only of amino acids of degeneracy two would
optimal codons, possibly to reduce the rate of amino acid
have FOP of 0.5, whereas a gene composed of amino
misincorporation as protein production cost rises.
acids of degeneracy four would have FOP of 0.25. Inorder to obtain a measure which is independent of aminoacid composition, we multiply each codon count in Eq.
2. Materials and methods
(2) by the corresponding amino acid degeneracy:
2.1. Frequency weighted by expression
The count ca of each amino acid a is calculated as
Here, syn(i) is the degeneracy of the amino acid coded by
i. This way a gene with close to random synonymous codon
usage will have FOP value close to 1, regardless of its
amino acid composition. To see that this is a sensiblemeasure, we write Eq. (3) in a slightly different way:
where ca( g) is the count of a in the gene g, E( g) is theexpression level of g (average of expression; see below),
X naaðiÞðgÞ niðgÞ=naaðiÞðgÞ
and the sum is taken over all the relevant genes (either the
highly expressed genes or all expressed genes). The
expression-weighted frequency f ex
of the amino acid a is
aa(i)( g) is the count of the amino acid coded by i in
g. Assigning fi( g)=ni( g)/naa(i)( g) and faa(i)( g)=naa(i)( g)/
where the sum in the denominator is over all the amino
acids. This calculation is similar to the one performed by
Now, the second multiplier is just the relative synon-
for C. elegans. In a similar manner, we
ymous codon usage, or RSCU, of the codon i in the gene g
compute the expression-weighted frequency of a codon.
(Hence, the FOP measure is a weighted
Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138
mean of the RSCUs of the optimal codons, where the
[Our definition of Breadth is different from the usual
weights are the corresponding amino acid frequencies:
definition (which is the number of
tissues in which a gene is expressed. However, calculating
ð ÞðgÞRSCUiðgÞ
breadth in both ways yielded two sets of values which are
This measure does not take into account the background
highly correlated (RN0.96). As for Average expression, it
nucleotide composition. In order to correct FOP for back-
may seem more accurate to average first among the libraries
ground nucleotide content, we replace 1/syn(i) in Eq. (5) by
of each tissue and then to average over the tissues, as in
(we observed that the values
i ( g), the expected proportion of the codon i among its
synonyms, based on the non-coding region surrounding the
computed this way and those computes by simply averaging
gene (see below for the way to compute Enc
over all available libraries are highly correlated (RN0.98)].
iV( g) = fi /Ei ( g) in Eq. (6), we get
2.4. Codon bias corrected for background nucleotide
Replacing syn(i) in Eq. (3) with 1/Enc
i , we get a simpler
We used four methods to compute codon bias, corrected
way to compute FOPV( g):
for background nucleotide content:
Effective number of codons corrected for background
where the sum is taken over all optimal codons for which
nucleotide content or ENCV (
B measure (which is applied as
2.3. Computation of gene expression levels
ð Þ f ð Þ EncðgÞ
Expression levels for individual genes were taken from
SAGE (versionof July 21, 2003). Only tags that matched a named gene
where fi( g) is the proportion of the codon i in the gene
were taken into account. Expression values were calculated
g among its synonymous codons; Ei ( g) is the expected
by counting the tags for each gene in each library,
proportion of i in g (see below); faa(i)( g) is the
normalized per 200,000 tags, and combined over 43
frequency of the amino acid coded by i in g; and the
libraries representing 18 normal tissues: brain (7 libraries,
sum is over all codons.
311,726 tags), breast (7 libraries, 310,477 tags), colon (2
HK measure: computing the uncorrected codon bias
libraries, 76,954 tags), heart (1 library, 71,926 tags), kidney
(1 library, 30,721 tags), liver (1 library, 58,467 tags), lung (1
ð Þ f ð Þ 1=synðiÞ
library, 77,024 tags), muscle (2 libraries, 88,332 tags), ovary
(2 libraries, 81,270 tags), pancreas (2 libraries, 54,673 tags),peritoneum (1 library, 53,527 tags), placenta (2 libraries,
where syn(i) is the degeneracy of the amino acid coded
207,348 tags), prostate (4 libraries, 232,573 tags), retina (4
by i (the number of synonymous codons for i). Then
libraries, 239,211 tags), spinal cord (1 library, 45,109 tags),
computing the regression line of CB( g) versus non-
stomach (1 library, 18,193 tags), vascular (2 libraries,
coding GC-content, from the non-coding regions
91,131 tags), white blood cells (2 libraries, 67,177 tags).
surrounding g, and subtracting the regression line from
We combined the expression levels in the libraries in
the CB measure (as done by
three ways: (a) breadth of expression, defined here as the
thus we shall denote this method HK). This is based on
number of libraries in which the gene was expressed; (b)
the known observation that codon bias is positively
average over the libraries; and (c) maximum over the
correlated with both non-coding GC-content and
libraries. The correlation values among the three methods
expression level in some eukaryotic genomes, including
are listed in . All correlations are highly significant
the human genome (see below in the next subsection).
Maximum-likelihood codon bias, or MCB (
b10100. Average is the method that correlates the
best with the two other methods.
Hurst, 2001).
2.5. Computing expected values
Correlation coefficients between different methods of combining values inSAGE libraries
For the first three of the four methods described above,
we need non-coding sequences neighboring a given gene
(the fourth method, MCB, uses the coding sequence itself).
We used the sequence consisting of the introns of the gene,
the 1000 nucleotides immediately preceding the coding area
Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138
of the gene, and similarly, those 1000 nucleotides immedi-
acid and fourfold one). It is clear that the frequency of a
ately succeeding it (or truncated, as necessary, in the case
codon XYZ among its synonyms in a gene could be affected
that genes were less than 1000 nucleotides apart; see also
by the background frequencies of both the YZ dinucleotide
and the ZN dinucleotide, where N is any of the four bases.
intron is longer than 2000 bp, only the 1000 nucleotides on
Calculating the expected relative frequencies of XYZ at the
each of the intron's ends were taken. By taking 1000
non-coding surrounding is done as follows:
flanking bases, we assure that regions that may be under
selective constrains, both in flanking regions and introns,
constitute only a small portion of the strands that are used as
control. On the other hand, regions of large introns that are
far from any coding sequence may not represent themutational bias that acts on the nearby exons, and thus
introns were truncated to 1000 bases on each end. We
masked repetitive elements using RepeatMasker (
In warm-blooded vertebrates, including human, it is well
known that disochoresT structure (
Where aa(XYZ) is the amino acid coded by XYZ and
correlates with both expression level and codon bias
aa(XYZ) is the number of triplets that code for aa(XYZ), that
are followed by the base N, divided by the total number of
correction of codon bias for the influence of isochore
triplets that code for aa(XYZ). All triplets are counted in the
structure is essential, it is not enough. It has been observed
non-coding surrounding of g in all three reading frames.
that the G and C compositions (and similarly the A and T
We calculated FOPV and B using both methods. The B
compositions), in both coding and non-coding sequences, in
(Eq. (9)) values for all genes in the study, with Enc
the human genome, are not equal, and the differences
calculated in both methods are highly correlated (R=0.96),
(termed GC-skew and AT-skew, respectively) correlate with
and similarly for FOPV in Eqs. (7) and (8) (R=0.93). Thus,
expression level (Thus, the correction of codon
the results involving these measures, with Enc
i ( g) calculated
bias should consider the base composition of the background
in both ways, are almost identical. In this paper, we included
in a more differential manner. Here we introduce two
only the results where Enc
i ( g) was computed in the second
methods for computing Enc
method which seems more accurate, as it considers
i ( g) (see Eqs. (7)–(9)).
We treated the amino acids with six synonymous codons
as two independent amino acids each, one with four codons,and one with two (as in so that any
2.6. Protein biosynthetic cost measures
two synonymous codons differ only in the third codonposition.
We used the size/complexity score of for
The first and simpler method applies the base proportions
amino acids To evaluate the biosynthetic cost of a
in the non-coding surrounding of the gene to the third codon
protein, encoded by a given gene, we used two measures:
position. For example, if the base A appears 21% of thetimes in the non-coding surrounding of a given gene, then in
2.6.1. Average size/complexity score
a fourfold amino acid, the codon ending with an A will have
Each codon was given the score of the amino acid it
encodes. The size complexity score was averaged over the
i ( g) value of 0.21.
However, since it is known that there are dinucleotides
codons of a gene.
that are in excess or are avoided in the genome, for example,the dinucleotide CG is depleted by its tendency to mutate
2.6.2. Frequency of expensive amino acids
and to disappear from genomic sequences (
This is the relative frequency of amino acids with a size/
2000), it may not be enough to consider single nucleotide
complexity score greater than 40 (Arg, Cys, His, Phe, and
frequencies for correction. The second method incorporates
Tyr). We excluded the single-codon amino acids Met and
the dinucleotide composition of the non-coding surrounding
Trp, since they do not contribute to the FOPV or to the codon
of the gene in the following manner:
We count the number of appearances of each triplet in all
three reading frames of the non-coding surrounding of a
2.7. Sequence data
gene. For each triplet XYZ, we denote this number by #XYZ.
For a codon XYZ, we denote by S(XYZ) the set of bases that
Gene and intron sequences were downloaded from
when replacing Z would yield a synonymous codon
NCBI GenBank, Build 33 (
(including the base Z itself). For example, S(GCA)=
H_sapiens/). We included only CDSs that start with a start
{A,C,G,T}, S(AGC)={C,T}, and S(ATT)={A,C,T} (recall
codon, end with a stop codon, have a length that is a
that the sixfold amino acids are split into a twofold amino
multiple of three, and have no unidentified bases. For genes
Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138
Table 2Amino acids in the human genome: Frequency, by expression in highly expressed genes and in all expressed genes, Isoaccepting tRNAgene copy numbers, and size/complexity score (as in
Frequency weighted by expression
Frequency weighted by expression
Isoaccepting tRNA
in highly expressed genes
in all expressed genes
with more than one CDS, we took the longest CDS. Thus,
frequencies (excluding two outliers, we obtained a
between 5% and 10% of the genes were excluded. In
highly significant positive correlation (R=0.654, pb0.001,
addition, less than half the genes were expressed in the 43
and R=0.56, pb0.001 when including the outliers). Similar
SAGE libraries (see below). After a further removal of
results were obtained when all expressed genes were taken,
genes, required by the computation of MCB (
and slightly lower correlation were obtained when frequen-
Hurst, 2001), 14,131 genes were left for the analysis.
cies were not weighted (R=0.565, p=0.009 for amino acids;R=0.62, pb0.001 for codons). These correlations do notnecessarily prove a correlation between cellular tRNA
abundance and the number of tRNA gene, but, as indicatedby if such relation were not present, we would
In some unicellular organisms, a positive relation
not expect any correlation between the frequencies of amino
between cellular tRNA and tRNA gene copy number was
acids and codons and the number of tRNA genes.
found (et al., 1999). As in C. elegans and other eukaryotes
3.1. The relation between gene copy numbers and gene
2001), in the human genome there is also a redundancy inthe set of tRNA genes (see also
Under the assumption that gene copy numbers can be
used as an indication for cellular tRNA abundance (see also
number of tRNA genes varies from 7 (Trp) to 44 (Val).
we used frequency of optimal
Although a variation between different tRNA genes could
codons (FOPV, see Materials and methods) as a measure of
be in the transcription level, a positive correlation between
translation efficiency. As explained above, this measure is
intracellular tRNA and tRNA gene copy number is expected
independent of the amino acid frequencies and is corrected
(We assume such correlation
for background nucleotide content. Here, the term optimal
and use the number of tRNA genes as a measure for the
codon denotes the codon with the highest tRNA gene copy
amount of intracellular tRNAs. As done by for
number, for each amino acid (also termed major codon and
C. elegans, we measured the correlation between the
translationally superior codon; see We
isoaccepting tRNA genes and the expression-weighted
calculated the correlation between FOPV and expression
frequencies of the 20 amino acids, and those of the
level in over 14,000 genes in the all human chromosomes.
individual codons (see and Materials and
We combined the expression levels in 43 libraries in three
methods). When amino acid frequencies were considered
ways: (a) breadth of expression; (b) average over the
(a significant correlation was found for highly
libraries; and (c) maximum over the libraries (see Materials
expressed genes (R=0.585, p=0.007, N=4320). For codon
and methods). The correlations are very weak (R=0.075,
Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138
Table 3Codons in the human genome: tRNA gene copy numbers and frequencies in highly expressed genes and in all expressed genes
Isoaccepting tRNA
Frequency weighted by expression
in highly expressed genes
Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138
Fig. 1. (a) Isoaccepting tRNA gene copy number vs. frequency of amino acids and codons, weighted by expression, in highly expressed genes, in the humangenome. Frequencies were computed and weighted for the top 30% of all expressed genes (A total of 4320 genes). (a) Amino acid frequencies, R=0.585,p=0.007. (b) Codons frequencies, R=0.654, pb0.001. Codons translated by the same anticodon (the wobble effect) were regarded as one point. The data appearin Expression values were calculated by averaging over 43 SAGE libraries (see Materials and methods).
0.08, and 0.06, respectively) but highly significant
bias to rise with frequency of optimal codons. Interestingly,
( pb0.0001). The error-bar graphs in illustrate the
we found that genes with the lowest values of FOPV also
relation between the FOPV and the average expression (see
tends to have high codon bias, even higher than in genes
Materials and methods). Although the correlations are weak,
with the highest value of FOPV. This is illustrated in the
the graph indicates that the FOPV values rise as expression
graphs in The high values of codon bias in genes with
level rises.
low frequency of optimal codons indicate that the bias tendsto be for codons that correspond to low tRNA gene copy
3.2. The role of codon bias
numbers, and this in turn may suggest that the bias is forcodons with low translational values. The figure shows the
Since codon bias is the unequal use of synonymous
relation between FOPV and four measures of codon bias:
codons, we expect high codon bias in genes with stronger
Effective number of codons, or ENCV (B
preference for optimal codons, and thus we expect codon
measure (HK measure (2002), and maximum-likelihood codon bias, or MCB(ENCV and B are corrected forbackground nucleotide composition by considering dinu-cleotides in the background (see Materials and methods).
When considering the relation between codon bias and
gene expression level, we encounter another unexpectedresult. Instead of the expected positive correlation betweencodon bias and expression level, based on studies indifferent organisms (see the Introduction), we observed thatthe average codon bias is highest both in the classes ofgenes with the highest as well as with the lowest expressionlevels. This is shown in the graphs of Similar graphswere obtained when using breadth of expression ormaximum expression (see Materials and methods, graphsavailable upon request).
As indicated above, a rise in codon bias as expression
level drops has not been observed in lower organisms. As itappears, in addition to the role associated with codon bias inenhancing the expression of certain genes by preferring
Fig. 2. Frequency of optimal codons (FOP V) vs. average expression. 14,131
codons with high cellular levels of isoaccepting tRNA (as
genes expressed in 43 SAGE libraries, were divided, according to expressionlevel, into ten categories of approximately equal size. Circles represent the
was found for unicellular organism;
mean value. Error bars show 95% of confidence. R=0.08, pb0.0001.
et al., 1999) or with high tRNA gene copy number (
Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138
Fig. 3. Codon bias vs. frequency of optimal codons (FOPV). (a) Effective number of codons (ENCV); (b) B; (c) Regression line subtraction (HK, see Materialsand methods); (d) Maximum-likelihood codon bias (MCB). A total of 14,131 genes was divided, according to FOPV, into ten categories of approximately equalsize. Circles represent the mean value. Error bars show 95% of confidence.
2000), here we found that in the human genome, codon bias
bexpensiveQ amino acids, as estimated by the size/complex-
may have an additional role: controlling the expression of
ity measure. Similar graphs were obtained when using
certain genes by preferring codons with small tRNA gene
breadth of expression or maximum expression.
copy number.
Interestingly, we found that the frequency of optimal
codons is in significant positive correlation with the
3.3. Translation efficiency and amino acid biosynthetic cost
measures of biosynthetic cost (R=0.18, pb0.0001), asindicated in
Translation efficiency is also affected by the size, the
Although this result seems to be counterintuitive, since it
structure, and the production cost of the amino acids
was shown above that for highly expressed genes, the size/
incorporated in the protein. To further analyze these factors,
complexity score of genes tends to decrease with the
we hypothesized that proteins coded by highly expressed
average expression, and thus showing a possible mechanism
genes are composed of smaller and biosynthetically cheaper
of preference for cheap and smaller amino acids in highly
amino acids.
expressed genes, the correlations between the FOP and
The relation between expression levels and biosynthetic
expression level and between expression level and size/
cost is shown in We use the size/complexity quotient
complexity score are too weak to infer from about the
of as a measure of biosynthetic cost (
relation between the FOP and size/complexity score.
2). A clear monotonic relation between expression levels
The graph indicates that genes that encode for more
and biosynthetic cost is evident. Genes with higher average
expensive amino acids tend to have more tRNA genes for
expression (see Materials and methods) tend to code less for
their anticodons. This suggests the possibility of a mecha-
Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138
Fig. 4. Codon bias vs. Average expression level. (a) Effective number of codons (ENCV); (b) B; (c) Regression line subtraction (HK, see Materials andmethods); (d) Maximum-likelihood codon bias (MCB). A total of 14,131 genes was divided, according to expression level, into ten categories of approximatelyequal size. Circles represent the mean value. Error bars show 95% of confidence.
Fig. 5. Biosynthetic cost vs. average expression. (a) Average size/complexity score; (b) Frequency of occurrence of the most expensive amino acids (seeMaterials and methods). A total of 14,131 genes expressed in 43 SAGE libraries was divided, according to expression level, into ten categories ofapproximately equal size. Circles represent mean values. Error bars show 95% of confidence. Correlation coefficients are (a) R=0.071 ( pb0.0001) (b)R=0.127 ( pb0.0001).
Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138
codon bias used in this study. Previous analysis of the humangenome (did not show high codonbias for the weakly expressed genes. This may be explainedby the fact that in that study an older version of the genomewas used and that a large part of the weakly expressed geneswas deliberately excluded from the analysis.
In highly expressed genes, we show a tendency for high
codon bias, and also for higher frequency of optimalcodons. A possible explanation for this is that in thesegenes, the high codon bias is a consequence of more codonswith high tRNA gene copy number, increasing the trans-lation elongation rate. The finding that the average codonbias is high for lowly expressed genes, together with theresult that these genes tend to have low frequency of optimalcodons (and therefore their high bias is probably the
Fig. 6. Frequency of optimal codons vs. Average size/complexity score.
consequence of favoring non-optimal codons in terms of
R=0.18, pb0.0001. A total of 14,131 genes expressed in 43 SAGE libraries
translation efficiency), suggest that some lowly expressed
was divided according to average size/complexity into ten categories of
genes may also experience the effect of natural selection
approximately equal size. Circles represent the mean values. Error bars
against optimal codons. showed strong
show 95% of confidence.
evidence for a transcription-associated bias for higher G andT content on the coding strand of introns. This bias is
nism that compensates high cost of production of a protein
observed in coding regions in the third codon position as
by codon bias favoring optimal codons, enabling faster and
well (Computing codon bias and RSCU
more accurate translation.
values relative to expected values, derived from introns onthe coding strand only, as done in this study, avoids thedifference between the coding and the non-coding strands,
as observed in Since only half of theoptimal codons end with a G or a T, it is unlikely that the
In this work, we studied the relation between translation
transcription associated bias for G and T in both introns and
efficiency (as estimated by the number of tRNA genes) and
coding regions can accounts for the correlation between
gene expression level (as estimated by measures derived
FOP and expression level (and the relation between
from SAGE/EST data) in the human genome, and the
codon bias and expression level (
interrelations between these two factors and codon bias. For
We hypothesize that the translation efficiency of proteins,
this purpose, we introduced two methods: a method for
which can be a disadvantage in high levels, is controlled by
computing the frequency of optimal codons, which is
this mechanism of preferring codons with low tRNA
independent of amino acid composition and with correction
abundance, and thus regulating the elongation rate in these
for background nucleotide content, and a method for
proteins. Such a mechanism was suggested by
computing expected values of codon frequency, based on
Grosjean (1979), and supporting evidence was provided by
dinucleotide composition of the background. We showed
The latter found that some
that amino acid and codon frequencies, weighted by
E. coli regulatory genes contains an unusually high number
expression, correlate positively with tRNA gene copy
of codons that are not frequently used in most E. coli genes,
number, thus possibly indicating a relation between the
and therefore suggested that this could be part of a
number of tRNA genes and tRNA abundance. We showed
mechanism that helps to keep a low expression level in
that expression level is in weak, highly significant, positive
some regulatory genes. In another study,
correlation with frequency of optimal codons (which is
(1991), who showed that in E. coli, S. cerevisiae, D.
assumed to be a measure of translation efficiency). This, in
melanogaster, and primates (mainly Homo sapiens) pro-
turn, shows that codon choice, or codon bias, relates to
teins containing a high percentage of low-usage codons can
expression level. A caveat must be admitted here, since we
be characterized as cases where an excess of the protein
used measures of the transcriptome, indicating numbers of
could be detrimental. Another indication of this mechanism
mRNAs, and not of proteins, but due to the lack of data on
in bacteria is provided by who showed
protein levels in human, we assume that the former can be
evidence suggesting that the translation of proteins involved
induced by the mRNA levels as used here. In addition, we
in various specialized functions may be regulated by using
obtained a surprising result not observed in previously
rare codons. and
studied organisms, namely, that the average codon bias is
presented experimental evidence which also indicates that
high both in weakly expressed genes and in highly expressed
the presence of non-optimal codons can reduce translation
genes. This result was obtained with all four measures of
efficiency. Support to the notion of an expression regulation
Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138
mechanism in lowly expressed genes in mammals can be
gene expression, to asses its association with amino acid
also found in the work of and
usage, is average expression or breadth of expression, (see
(1997), who showed that modifying the codon composition
Materials and methods). These measures reflect more
of non-mammalian genes to resemble that of mammalian
accurately the total activity of genes in different tissues.
genes can significantly enhance their translation in mamma-
We showed that frequency of optimal codons correlates
lian cells, where the translation of the original genes is
positively with protein production cost. We suggest that this
limited. This model was challenged by arguing that the
may be an indication of the action of selection on codon bias
presence of rare codons in lowly expressed genes is due to
to reduce error rate in the production of costly proteins. This
mutational drift randomizing codon usage (
mechanism was suggested by who showed
1986). However, it is hard to see how random drift can
evidence that natural selection acts on synonymous codon
account for the high level of codon bias, favoring rare and
usage to enhance the accuracy of protein synthesis in
non-optimal codons, in lowly expressed genes, as observed
Drosophila, based on association between synonymous
in the human genome, in this study. More research is needed
codon usage and amino acid constraint. In the study of
to show whether this mechanism is used to regulate
a negative correlation between
translation rate. Since the former hypothesis leads to an
metabolic costs of amino acids and codon bias was shown.
experimental prediction, namely, that proteins that could be
This relation seems to contradict the result presented here.
of disadvantage in excess, indeed contain significantly more
However, apart from the fact that their study deals with
non-optimal codons, it is of high value.
bacteria, and that both selective forces and regulation
Using the size/complexity index developed by
mechanism may be different in higher organisms, there
(1997, see Materials and methods) as an estimate of the
are other factors that may explain the difference: as was
amino acid cost, we found that there is a negative correlation
mentioned above, the metabolic cost calculation does not
(R=0.071) between expression level and the size/complex-
take into account the size and the complexity of amino
ity score. This result is expected if amino acid usage is
acids, and besides, in other organisms other factors such as
shaped by selective forces to optimize translation efficiency.
biosynthetic pathways and dietary conditions may contrib-
A more pronounced effect was demonstrated when the
ute differently to the amino acid composition. In addition,
correlation between the frequency of expensive amino acids
the MCU measure in that study is different from the FOPV
within genes and the expression level is considered
measure as was defined here, and also the former is not
(R=0.127, see also used
corrected for background composition and for amino acid
Dufton's index to examine its association with gene
usage composition, as in the FOPV.
expression level, and found a similar tendency to avoid
Two points of caution should be emphasized here: First,
the use of complex amino acid in highly expressed genes.
as done in previous studies (
Another study (conducted on
we assumed a correspondence between tRNA gene copy
B. subtilis and E. coli, and concentrated on the relation
numbers and tRNA cellular abundance. As far as we know,
between metabolic costs of amino acid biosynthesis and
this relation, however proved for several organisms, has not
patterns of amino acid composition, shows an increased
been substantiated for humans. As noted above, the results of
usage of less energetically costly amino acids in highly
this study may suggest this relation, since without such
expressed genes in both cells, and thus support the action of
correspondence one could not expect the correlations,
selection on amino acid usage to increase metabolic
observed here, between gene copy numbers and amino acid
efficiency. It should be noted however, that Dufton's index,
and codon frequencies. The fact that, in 14 out of 18 amino
although an indirect measure, takes into account the size and
acids, the codons with the highest tRNA gene copy numbers
the structural complexity of amino acids, factors that have
also exhibit an increase in their frequency when comparing
an influence on the rate of incorporating amino acids in the
between lowly and highly expressed genes (
elongation process.
support this assumption. However, this is not a proof of such
Contrary to these results, using micro-
a relation. Thus, some of the results suggested in the above
array data, reports that there is no detectable influence of
discussion, concerning the action of selection on codon bias
expression on amino acid usage. One reason to this
for translation efficiency, rely partly on an assumption, that,
discrepancy could be the utilization of two different
although reasonable, has not been firmly substantiated yet.
methods for the estimation of the expression level. Another
The second point is that the correlations between FOPV
possible explanation could be the fact that Comeron
and the different expression level measures are very weak.
analyzed the expression of each tissue separately
FOPV was calculated by controlling the effect of amino acid
(dexpression levelT), which can be problematic. Genes
composition and that of background nucleotide content. It is
which are highly expressed in one tissue can be of poor
not unconceivable that another third effect may account for
expression in another, or not expressed at all. Since it is
the observed correlations.
more reasonable to assume that if there is a selection for
In summary, based on the evidence presented, we suggest
translational efficiency, it will be detected in genes with high
three possible ways in which selection may act on codon
activity in many tissues, the more appropriate measure for
bias in the human genome: (1) Increasing translation
Y. Lavner, D. Kotlar / Gene 345 (2005) 127–138
efficiency in highly expressed genes; (2) regulating trans-
Bacillus subtilis tRNAs: gene expression level and species-specific
lation efficiency of some proteins that can be a disadvantage
diversity of codon usage based on multivariate analysis. Gene 238,143 – 155.
at high levels; and (3) improving translation efficiency and
Kanaya, S., Yamada, Y., Kinouchi, M., Kudo, Y., Ikemura, T., 2001. Codon
reducing the rate of amino acid misincorporation in the
usage and tRNA genes in eukaryotes: correlation of codon usage
production of biosynthetically expensive proteins.
diversity with translation efficiency and with CG-dinucleotide usage asassessed by multivariate analysis. J. Mol. Evol. 53, 290 – 298.
Karlin, S., Mrazek, J., 1996. What drives codon choices in human genes?
J. Mol. Biol. 262, 459 – 472.
Karlin, S., Mrazek, J., Campbell, A.M., 1998. Codon usages in different
gene classes of the Escherichia coli genome. Mol. Microbiol. 29 (6),
We thank Edward Trifonov, Laurent Duret and Giuseppe
1341 – 1355.
D'Onofrio for valuable discussions that contributed to this
Knight, R.D., Freeland, S.J., Landweber, L.F., 2001. A simple model based
manuscript. We also thank Yefim Yakir for preparing part of
on mutation and selection explains trends in codon and amino-acid
the figures and for technical support and Nurit Carmi for
usage and GC composition within and across genomes. Genome Biol. 2
Konigsberg, W., Godson, N., 1983. Evidence for use of rare codons in the
dnaG gene and other regulatory genes of Escherichia coli. Proc. Natl.
Acad. Sci. U. S. A. 80 (3), 687 – 691.
Kreitman, M., Comeron, J.M., 1999. Coding sequence evolution. Curr.
Opin. Genet. Dev. 9 (6), 637 – 641.
Akashi, H., 1994. Synonymous codon usage in Drosophila mela-
Lander, E.S., et al. International Human Genome Sequencing Consortium,
nogaster: natural selection and translational accuracy. Genetics 136
2001. Initial sequencing and analysis of the human genome. Nature
(3), 927 – 935.
409, 860 – 921.
Akashi, H., 1995. Inferring weak selection from patterns of polymorphism
Marais, G., Mouchiroud, D., Duret, L., 2001. Does recombination improve
and divergence at bsilentQ sites in Drosophila DNA. Genetics 139,
selection on codon usage? Lessons from nematode and fly complete
1076 – 1677.
genomes. Proc. Natl. Acad. Sci. U. S. A. 98 (10), 5688 – 5692.
Akashi, H., Gojobori, T., 2002. Metabolic efficiency and amino acid
Moriyama, E.N., 2003. Codon usage, Encyclopedia of the human genome.
composition in the proteomes of Escherichia coli and Bacillus subtilis.
Macmillan Publishers, Nature Publishing Group.
Proc. Natl. Acad. Sci. U. S. A. 99 (6), 3695 – 3700.
Moriyama, E.N., Powel, J.R., 1997. Codon usage bias and tRNA
Bernardi, G., et al., 1985. The mosaic genome of warm-blooded vertebrates.
abundance in Drosophila. J. Mol. Evol. 45, 514 – 523.
Science 228 (4702), 953 – 958.
Novembre, J.A., 2002. Accounting for background nucleotide composition
Comeron, J.M., 2004. Selective and mutational patterns associated with
when measuring codon usage bias. Mol. Biol. Evol. 19 (8), 1390 – 1394.
gene expression in humans: influences on synonymous composition and
Pedersen, S., 1984. Escherichia coli ribosomes translate in vivo with
intron presence. Genetics 167, 1293 – 1304.
variable rate. EMBO J. 3 (12), 2895 – 2898.
Dong, H., Nilsson, L., Kurland, C.G., 1996. Co-variation of tRNA
Percudani, R., Pavesi, A., Ottonello, S., 1997. Transfer RNA gene
abundance and codon usage in Escherichia coli at different growth
redundancy and translational selection in Saccharomyces cerevisiae.
rates. J. Mol. Biol. 260, 649 – 663.
J. Mol. Biol. 268, 322 – 330.
Dufton, M.J., 1997. Genetic code synonym quotas and amino acid
Robinson, M., et al. , 1984. Codon usage can affect efficiency of translation
complexity: cutting the cost of proteins? J. Theor. Biol. 187, 165 – 173.
of genes in Escherichia coli. Nucleic Acids Res. 12 (17), 6663 – 6671.
Duret, L., 2000. tRNA gene number and codon usage in the C. elegans
Saier, M.J., 1995. Differential codon usage: a safe guard against
genome are co-adapted for optimal translation of highly expressed
inappropriate gene expression of specialized genes. FEBS 362, 1 – 4.
genes. Trends Genet. 16 (7), 287 – 289.
Sharp, P.M., Li, W.-H., 1986. Codon usage in regulatory genes in
Duret, L., 2002. Evolution of synonymous codon usage in metazoans. Curr.
Escherichia coli does not reflect selection for rare codons. Nucleic
Opin. Genet. Dev. 12, 640 – 649.
Acids Res. 14, 7737 – 7749.
Duret, L., Mouchiroud, D., 1999. Expression pattern and, surprisingly, gene
Sharp, P.M., Tuohy, T.M., Mosurski, K.R., 1986. Codon usage in yeast:
length shape codon usage in Caenorhabditis, Drosophila, and
cluster analysis clearly differentiates highly and lowly expressed genes.
Arabidopsis. Proc. Natl. Acad. Sci. U. S. A. 96 (8), 4482 – 4487.
Nucleic Acids Res. 14, 5125 – 5143.
Fiers, W., Grosjean, H., 1979. On codon usage. Nature 277 (5694), 328.
Urrutia, A.O., Hurst, L.D., 2001. Codon usage bias covaries with
Grantham, R., Gautier, C., Gouy, M., Mercier, R., Pave, A., 1980. Codon
expression breadth and the rate of synonymous evolution in humans,
catalog usage and the genome hypothesis. Nucleic Acids Res. (8),
but this is not evidence for selection. Genetics 159, 1191 – 1199.
r49 – r62.
Urrutia, A.O., Hurst, L.D., 2003. The signature of selection mediated by
Graur, D., Li, W.-H., 2000. Fundamentals of Molecular Evolution, 2nd ed.
expression on human genes. Genome Res. 13 (10), 2260 – 2264.
Mass, Sinauer, Sunderland.
Versteeg, R., et al. , 2003. The human transcriptome map reveals extremes
Hey, J., Kliman, R.M., 2002. Interactions between natural selection,
in gene density, intron length, GC content, and repeat pattern for
recombination and gene density in the genes of Drosophila. Genetics
domains of highly and weakly expressed genes. Genome Res. 13 (9),
160, 595 – 608.
1998 – 2004.
Ikemura, T., 1981. Correlation between the abundance of Escherichia coli
Wells, K.D., Foster, J.A., Moore, K., Pursel, V.G., Wall, R.J., 1997. Codon
transfer RNAs and the occurrence of the respective codons in its protein
optimization, genetic insulation, and an rtTA reporter improve perform-
genes. J. Mol. Biol. 146 (1), 1 – 21.
ance of the tetracycline switch. Transgenic Res. 8 (5), 371 – 381.
Ikemura, T., 1982. Correlation between the abundance of yeast transfer
Wright, F., 1990. The deffective number of codonsT used in a gene. Gene 87,
RNAs and the occurrence of the respective codons in protein genes.
Differences in synonymous codon choice patterns of yeast and
Zhang, S., Zubay, G., Goldman, E., 1991. Low usage codons in Escherichia
Escherichia coli with reference to the abundance of isoaccepting
coli, yeast, fruit fly, and primates. Gene 105, 61 – 72.
transfer RNAs. J. Mol. Biol. 158 (4), 573 – 597.
Zhou, J., Liu, W-J., Peng, S.W., Sun, X.Y., Frazer, I., 1999. Papillomavirus
Kanaya, S., Yamada, Y., Kudo, Y., Ikemura, T., 1999. Studies of codon
capsid protein expression level depends on the match between codon
usage and tRNA genes of 18 unicellular organisms and quantification of
usage and tRNA availability. J. Virol. 73, 4972 – 4982.
Source: http://spl.telhai.ac.il/speech/pub/Lavner_Kotlar_Gene_345.pdf
BY SZILVIA HICKMAN NATURAL SKIN CARE to prevent damage to the body diminishes the skin's structural support and decreases ANTIOXIDANTS WORK before it actually happens. its elasticity, resilience and suppleness, leads to They protect and nurture cells naturally, combating inflammation, and is the source of liver spots and poor
REDWOOD INDUSTRIAL CO., LTD. MATERIAL SAFETY DATA SHEET MSDS No: 002004.001 Issue Date: June 4,2002 1. PRODUCT AND COMPANY IDENTIFICATION Product Name: CARTRIDGES FOR POWER DEVICES Chemical Name: Mixture---Metal Alloy Synonyms: Rimfire Cartridge for power devices, 22, 25, 27 Carliber Power tool Round, Power Load, Powder Load, Booster