post n. 27 English
The origin of the genetic code has been defined: the universal enigma.
To make the reasoning understandable to those who have not read the previous articles, and so as not to lose track, it will help to repeat what is meant when we say the genetic code, which is the law of correspondence between the mRNA and amino acids for the synthesis of proteins.
The mRNA is the nucleic acid messenger that carries the information for the synthesis of proteins and consists of nucleotides.
The nucleotide constituents are:
the phosphate group: (H2PO4)-.
A sugar, a Ribose, which exists in two forms Destro and Levo, a mirror image one of the other. Only the Dexter make part of nucleotides: D-Ribose.
Four nitrogenous bases: A (adenine) and Guanine) which belong to the family of the Purine, U (uracil) and Cytosine) to the pyrimidine family.
The bond between a phosphate group, a ribose molecule and a molecule of any of the four bases give rise to four different compounds which are called nucleotides. Of the four nucleotides, as an example, the Adenosine-5-phosphate is shown
Tying together some hundreds of different nucleotides a macromolecule is obtained: mRNA
(Figures from: "Lessons of Biophysics)" by Mario Ageno
In the mRNA the name of triplet is given to the three adjacent nucleotides and are indicated with the letters of the bases. For example, in the figure the three nucleotides that expose UAC bases make up a triplet. If were followed by GUA in the figure, we would be in the presence of another triplet and so on. Starting from this RNA macromolecule, through a process, today quite complex, proteins are assembled. For each triplet (also called codon) corresponds to a specific amino acid, one and one only, and that law of correspondence, represented with 3:1, is called: genetic code. Although in recent decades some exceptions have been discovered, it can be said that all living organisms on our planet use the same genetic code, it is therefore universal.
By having available four nucleotides the total number of triplets we obtain placing them three by three is 43 = 64. Three of these triplets are used as end signal (t.), in theory, the mRNA contains the information for 61 amino acids. Since the amino acids at the disposal of all living organisms are only 20, the genetic code is degenerate in the sense that more triplets are coding the same amino acid.
For example the triplets posing in 1st, 2nd, and 3rd position: GUU, GUC, GUA, GUG, all encode the same amino acid: the valine (Val).
Let's see now, very briefly, how the synthesis of proteins in living organisms works today.
(Figure prepared by: "Lessons of biophysics") Mario Ageno
A particular sequence of nucleotides of the DNA (gene), containing the information for the synthesis of a well-determined protein, is transcribed into mRNA. This molecule, as the punched tape of an old electronic processor, slides inside an organelle, the Ribosome. It reads the information contained in mRNA and for each triplet of consecutive mRNA bases (codon) it matches the specific amino acid. The amino acid, however, does not enter directly in the ribosome, it is transported by a particular type of nucleic acid, the tRNA. The tRNA contains at one end a triplet of bases (anticodon), complementary to the codon, and at the other end the specific amino acid. Through the participation of enzymes the amino acids Pro, Phe, Ala, Ser and so on, are therefore bound, in the right order, to form the protein (as in the figure).
As we said this is an extremely simplified representation of the process. Just consider the bacterial ribosome which consists of two subunits: the first is linked to thirty four proteins and the second to twenty one proteins; both contain nucleic acids. Such a complicated structure definitely was not present in the prebiotic era. Also, in the cell, for each amino acid an adapter is present, a tRNA, with a specific enzyme, meaning other 40 molecules and if you add the enzymes participating in the whole process we reach a total of 50 compounds. Such a complex system is unimaginable at the dawn of life. All scientists who deal with the problem, believe that in the beginning a protein synthesis process surely existed in a much more simple and rudimentary form.
The first to develop a theory on the origin of the genetic code, a year after the discovery of the double helix of DNA, was George Gamow in 1954. He proposed a direct interaction between the nucleic acid triplet and amino acid. Indeed at that time the role of mRNA had not yet been discovered. Gamow did not propose any chemical-physical mechanism to the law of correspondence between triplet and amino acid, and therefore was easily to demplish his theory.
In fact, if we consider the sequence of four bases UUCG, UUC encodes an amino acid while UCG encodes another amino acid, as it would be the choice. Without a chemical-physical mechanism, given four bases you can jump from one to another triplet and give rise to completely different proteins. The Gamow's theory, however, remained an attractive idea.
Woese in 1966, with other researchers, published: "The molecular basis for the genetic code". The work involves a chromatography research on paper to study the interaction between triplets and amino acids. Not being able to directly use neither the trinucleotide or bases, Woese and colleagues chose the Pyridine as a solvent, a compound similar to Pyrimidine, founder of uracil and cytosine bases. The authors conclude that there is a hierarchy in the bases of the triplet, regarding the choice of amino acids, defined in terms of polar or nonpolar interactions. In particular, the choice of amino acids is mainly determined by the base in 2nd position. The base in the first position is seen as a disturbance which chooses between similar amino acids, while the third position interacts weakly on the choice of amino acids and therefore plays a minor role.
Analyzing the issue, Jacques Monod in "Chance and necessity" in 1970 concludes with this alternative:
«A) The structure of the genetic code can be explained in chemical terms or more exactly stereochemistry; if a certain codon was chosen to represent a given amino acid, it means that, between them, there existed a certain stereochemistry affinity;
B) The structure of the code is arbitrary from the chemical point of view; The code as we know it today, comes from a series of random choices that gradually enriched it.
The first hypothesis seems by far the most attractive, because it would explain the universality of the code and then, because it would allow to imagine a primitive translation mechanism in which the sequential alignment of the amino acids in the polypeptide structure would be due to a direct interaction between the amino acids and the same replicated structure».
Monod refers to the conclusion of F. Crick in 1968: «Several attempts in this direction have actually been made but have resulted in more negative results than positive ones». F. Crick had long proposed as a hypothesis "the frozen accident." The origin of the genetic code, according to this hypothesis, meant it would have been a random event that happened is frozen and can no longer be reversed.
Mario Ageno in "Lezioni di Biofisica" in 1984, examines the formal structure of the genetic code, which shows, as we shall see, a certain centrality of the bases in the first and second position in the allocation acid. He also reports one proposed by Orgel: «So, U in the 2nd position would have meant hydrophobic amino acid, A is a hydrophilic amino acid, while C and G always in the 2nd position would have meant intermediate hydrophobic amino acids between the first two groups». However, after he examined the works done at the time he ends by saying that after Crick's 1968 no steps forward have been made.
In researches after 1984 metabolic or coevolutionary processes, that still require the presence of other molecules mainly tRNA as adapters, were privileged. Among these researches caused some initial interest the works of Yarus M. (RNA-ligand chemistry: a testable source for the genetic code, 2000) and Yarus M1, Caporaso JG, Knight (Origins of the genetic code: the escaped triplet theory, 2005). In them it is assumed, that at least for some amino acids, some direct stereochemical interaction between codon and amino acid and subsequent evolutive processes have brought the code to minimize errors in translation. It seems that these errors of incorrect reading due to translation are in the ratio of 10:1:100 for errors relating respectively to the 1st, 2nd and 3rd position. The hypothesis already advanced by C. R. Woese in 1965 for the UUU triplet, was extended by the authors to all triplets containing a pyrimidine second position.
It is established that, once the process of the cell evolution began, the genetic code will have undergone some changes. As reported by Eugene V. Koonin and Artem S. Novozhilov in "Origin and evolution of the genetic code: the universal enigma", 2009: «Today, there is ample evidence that the standard code is not literally universal, but is subject to significant changes, without changes to its basic organization».
The issue is that evolutionary processes assume the existence of a cellular life. On the contrary in the presence of cells, if evolution had contributed to the origin of the genetic code, every living species would have developed its own genetic code and it would not be universal. The origin of the code must necessarily precede cellular life and Darwinian evolution. The era of the origin of the genetic code is the prebiotic era. Ultimately, after more than sixty years of research, it seems proven the centrality of the bases in the 1st and 2nd position in the selection of the amino acids with a predominant 2nd base, but we don’t know the origin of the genetic code yet.
How was it possible?
Perhaps behind it all there is a simple mistake.
As we have seen, the term "triplet” illustrates very well the structure of the genetic code, but hides a fundamental fact overseen by all: we know the properties of the amino acids but do not know the properties of the triplet. So in the end, we found ourselves comparing the chemical-physical properties of the amino acids with some letters of the alphabet (U, A, C, G). But there's more: each base of the triplet is linked (see picture above) to a Ribose and ribose to a phosphate group to form three nucleotides. The link through the phosphate groups of the three nucleotides form a trinucleotide that presents three bases, that is, the triplet. So the triplet properties are not those of the triplet but ones of the whole trinucleotide. We can express it this way: the trinucleotide have specific properties that vary with the triplet. It's the whole trinucleotide that specifies the amino acid and not the triplet alone. The ratio 3:1, three bases per amino acid is conceptually incorrect. The right representation would be: a trinucleotide per amino acid, 1:1. We can leave, for convenience, the representation 3:1, but it must be understood in this way: The trinucleotide that presents the three bases codes for a specific amino acid. And it is also wrong to say that the constituents of nucleic acids are nucleotides, because the nucleotide don’t represent anything; the constituents of nucleic acids are the trinucleotide. The trinucleotide can be considered a separate entity, which interact with other mRNA trinucleotide which should already present its peculiarity.
At last we have an indication of the existence of this peculiarity?
As we have seen elsewhere, in reference to the law of correspondence, Mario Ageno wonders if the genetic code has been from the beginning 3:1; is it possible that in primitive times it was different for example 2:1? In "Lessons of Biophysics" 1984, he excludes such a possibility, because, in that case, all metabolic processes made with a code 2:1 would be lost in the transition to a code 3:1 and evolution would have had to start again, but he added: «however, it is possible that at the beginning not all three positions were read: maybe the first two and the third had the spacing function».
If only the first two positions were read, leaving the third as spacing, that is, if we order the four nucleotides two by two only 42 = 16 amino acids would have been enough. According to Paul Davis, (Da dove viene la vita, 2000), grouping the four nucleotides in pairs rather than triplets and the use of 16 amino acids would have been much easier for the origin of life.
Life could have worked just as well with less than 20 amino acids. Probably life would not reach the level of today's complexity, but it would work.
Why was this choice not made?
We can also ask: having 4 bases why not choose a 4:1 code? Sure, with such a code you can encode 44 = 256 amino acids, with the increasing risk of translation errors; but, how did the molecules know about the risk! On the other hand, the choice of code may not have been an evolutionary process because life did not yet exist and therefore neither evolution.
So there were three code possibilities: 2:1, 3:1 or 4:1, the 3:1 triplet code predominated. But this means that the trinucleotide that exposes the triplet must have its own peculiarities. The trinucleotide that must have at least one property that distinguishes it from other codes.
Having said that, if in beginning there were no evolutionary processes, if there wasn’t a system of tRNA adapters with a specific enzyme for each amino acid because it is too complex, then there must be a stereochemistry affinity, not between triplets (codons) and amino acid but between the trinucleotides and amino acids; chemical-physical interactions that directly encodes the information of the nucleic acid.
So, let's first of all look for some indication of such stereochemistry affinity and, subsequently, which property distinguishes the trinucleotide from other codes.
As we have already amply illustrated in other articles, the crystalline quartz in contact with solutions gives rise to double electrical layers on its surface, comparable to micro capacitors. Through the measure of the flow potential, it is clear that amino acids accumulate on the quartz surface with a predetermined potential, specific potential.
Well, the structure of the genetic code and the specific potential of amino acids seem to have a mutual correspondence, in particular:
1) Eight out of twenty amino acids are already encoded by the first two letters (the third is indicated with a dot), that is, each amino acid as seen from the table of the genetic code is already encoded by a single base pair in 1st and 2nd position.
Leu – CU∙ Val – GU∙ Ser – UC∙ Pro – CC∙ Thr – AC∙ Ala – GC∙ Arg – CG∙ Gly - GG∙
Of these amino acids, the three at our disposal, heach has its another specific potential.
Pro 10,10 mV Val 9,90 mV Ala 9,70 mV
2) Eight pairs of amino acids are already encoded by the first two letters, which means in eight cases two amino acids are encoded by the same pair of bases in the 1st and 2nd position.
And are:(Phe, Leu) – UU∙, (Ileu, Met) –AU∙, (His, Gln) – CA∙, (Asn, Lys) –AA∙, (Asp, Glu) – GA∙, (Cys, Trp) –UG∙, (Ser, Arg) –AG∙, (Tyr, t) –UA∙,
with (t) as the end signal.
Of these amino acids, the four at our disposal:
Two amino acids, Phe and Leu, have the same specific potential of 9.50 mV;
And other two, Ileu and Met have the same specific potential of 9.30 mV.
3) In addition, two pairs of bases UU∙ and CU∙ encode Leu, the Leu produce two specific potentials, 9.50 mV and 8.10 mV.
We report the Chart already exposed in the post no. 19, "The origins of proteins: Part three", that shows the potential of specific amino acids available at our disposal:
The correspondence between the pair of bases in 1st and 2nd position of the triplet and the specific potential of the amino acids is clear.
A single base pair recognizes an amino acid: only one potential for the amino acid.
A single base pair recognizes two amino acids: only one potential for the two amino acids.
Two pairs of bases recognize an amino acid: two potential for that amino acid.
This correspondence is reciprocal which means: the potential generated via double electric layers confirm the central role of the bases in the first and second position, the centrality of such bases confirms their connection with double electrical layers. So then, the centrality of such bases should be of the same nature, specifically electrochemical, otherwise we’d find ourselves once again compare properties of amino acids with letters of the alphabet. There is therefore an indication of a stereochemistry affinity, which could be the sign of a rudimentary "fossilized" protein synthesis mechanism, that from the prebiotic era has preserved until our time.
How can we explain this stereochemistry affinity?
We have already said that the quartz in contact with a solution gives rise, on its surface to a double electric layer that has allowed us to know the specific potential. This mechanism is extended to the colloidal silica. And then, remembering that the bases don’t have properties but the trinucleotide do, we can extend these concepts to an RNA molecule of the prebiotic era.
The RNA is a large molecule which in contact with a solution gives rise, on its surface, to double electrical layers. Each trinucleotide finally represented by a triplet, has the property of generating its own specific electric field. Within this electric field the lines of force must have a helical shape determined by the presence of the Right form of Ribose, that is, D-Ribose, comparable to the hole of a screw. If the potential of this electric field is specific to an amino acid, with a left-handed structure, whose electric field has a molecular dipole similar to a screw recognizes it, and being complementary to the electric field, the amino acid attaches to the trinucleotide lowering the energy of the system.
Within this double electric layer, as we have amply illustrated in the previous article (The origin of the proteins: the synthesis of polypeptides), amino acids dragged by the "arrow of time", found the conditions necessary to be synthesised in proteins. Therefore there is a law of correspondence between trinucleotide and a specific amino acid, a chemical-physical recognition and complementary system. This kind of direct electrochemical recognition, represented by a ratio of 3:1, a triplet an amino acid, might have worked in the prebiotic era. It may have transferred, through an evolutionary processes, into the current mechanism, which is very complicated that involves, transfer RNA, ribosomes and enzymes.
It is likely that every trinucleotide, represented by a 3:1 code, had the property that sets it apart from other codes, to delimit its own electric field helical comparable to the hole of a screw. This electric field is mainly determined by the nucleotides that expose the bases in the 1st and 2nd position, with a dominant 2nd position, and at a lesser degree by the nucleotide that exposes the base in the 3rd position. This means that it is not possible to take a part of a trinucleotide and another part of a trinucleotide from elsewhere to constitute a new trinucleotide and encode a new amino acid. The electric field of a trinucleotide can’t be skipped.
From these considerations emerges the fact that the third base isn’t the spacing function as suggested by Ageno. As explained above, the four triplets with the first two letters UU encode the Leu and Phe while the 3rd base distinguishes them. Similarly the four triplets starting with AU encode Ileu and Met and the 3rd base distinguishes them. Some researches as we have already explained, show a minor role played by the 3rd base in the genetic code. Probably the nucleotide containing the 3rd base, completes and improves the electric field of the trinucleotide, but its contribution to the potential is weak and can’t be picked up directly by the instruments.
Now, is it possible that the assumptions presented above have left crystallized traces in the current molecular structures of the mRNA and proteins, traces from the distant past?
Are there data that we can take as evidence to support this hypothesis for sure?
We must start from the premise: If an amino acid recognizes a trinucleotide in the mRNA molecule, a trinucleotide recognizes an amino acid in dell'ꭤ-propeller protein structure.
As we have noted in previous articles, the colloidal silica rotates the plane of polarized
light and appears to give rise to structures of Levo quartz type. But the Levo quartz structures, under the analysis of X-rays were found to be Right-handed helices, and so the structures of the colloidal silica must have Right-handed helices. We also suggested elsewhere that the colloidal silica has retained on its surface the Levo amino acids whereby polypeptides can be synthesised. But if the colloidal silica has Right-handed helices also the polypeptides formed on its surface must be Right-handed.
In fact, one of the secondary structures of proteins is the ꭤ-Right-handed helix.
Now, if a trinucleotide recognizes an amino acid on the ꭤ-Right-handed helix and we imagine that on it was synthesized
RNA, these too will have a ꭤ-Right-handed helix. In fact, the RNA helix is
Right-handed. This analogy might suggest that RNA is Clockwise because of the
ꭤ-Right-handed helix used as a mould, which is Right-handed because it uses the
Right-handed colloidal silica as a template. The trend clockwise of these
molecules could represent a crystallized trace in their structures, a sign of their
connections back in time.
A famous detective would say the first clue is just a chance.
Is there some evidence to assume that in the prebiotic era the RNA used the ꭤ-Right-handed helix as a mould?
The helical structure of proteins, the ꭤ-Right-handed helix, is a periodic structure, consisting of amino acids, which after a rotation, returns to the line tangent to its initial position. Each helix rotation contains 3,6 amino acids. To recognize 3,6 amino acids, 3,6 trinucleotides are needed. Each trinucleotide is formed from three nucleotides, so 3,6x3 = 11. To recognize the 3,6 amino acids of a rotation of the ꭤ-Helix it takes 11 nucleotides.
The helical structure of RNA is also a periodic structure consisting of nucleotides that after a rotation return to the line tangent to its initial position. And how many nucleotides are included in a rotation of the RNA helix? 11 nucleotides, that is, 3,6 trinucleotides that serve to recognize amino acids 3,6.
Perhaps a geometric illustration is more effective. If we project on a plane a rotation of dell'ꭤ-Helix we get a circle. Dividing the angle around the number of amino acids for each helix rotation we’ll
If we project on a plane a rotation of the RNA helix we get a circle. A tour of the RNA helix contains 11 nucleotides, that is, 3,6 trinucleotide. Dividing the angle around the number of trinucleotide per helix revolution we’ll obtain: 360°:3,6 = 100°. Each trinucleotide covers a 100° arc, where the properties of the trinucleotide are enclosed as a separate entity.
Then, 3,6 trinucleotides and 3,6 Amino acids each in their own helix cover an equal arc of 100°; Can this still be considered chance?
Our famous detective would have added: The second clue is only coincidence.
All amino acids contain a carbon atom which are linked to a H atom, an NH2 group, a carboxyl group -COOH and a side chain R. To distinguish an amino acid from another is just this side chain R. The ꭤ-Helix is a structure stabilized by hydrogen bonds that compacts it, and inside there is no free space. All R, side chains which distinguish the amino acid, are arranged outside the helix, that is in the convex part (see ꭤ-Helix image).
The RNA helical structure is also stabilized by hydrogen bonds between the nitrogen bases. However, in the helix bases are inside the RNA helix, that is, in the concave part (see picture stranded RNA). This means that the arc of the RNA which contains in the concave part the trinucleotide and therefore the triplet may overlap the arc of dell'ꭤ-Helix, which in the convex part contains the amino acids. Trinucleotides and amino acids are, therefore in geometrical conditions that they can interact, the two helices can be overlapped.
And what would our detective of the third clue? To you the answer.
Imagine a 100° RNA arc that contains in concave part of the Helix a trinucleotide.
Let’s place this arc 100° towards the convex part of a ꭤ-Helix arc which contains the amino acid side chain (R). Trinucleotides and amino acids can interact but their atoms can’t touch, they are at an average distance of about 4 Å(Angstrom).
So the radius (in red) of the 100° RNA arc must necessarily be greater than the radius
(in black) of the 100° ꭤ-Helix arc. And so it is: the RNA helix has a radius of about 10Å while while the ꭤ-Helix radius is about 6Å. This means that the RNA Helix can wrap around the ꭤ-Helix by matching each trinucleotide to an amino acid
Three clues would have been enough for our detective, but we have provided a fourth one.
The theory above shows a correlation between experimental data, the structure of the genetic code and the molecular structure of proteins and RNA.
In chemical-physical, phase is defined as a homogeneous portion of matter limited by the separation surfaces. So for example: ice in water, fat particles in milk, sand in the water are constituted by two phases, solid and liquid; therefore they are biphasic systems.
The colloidal solutions consist of a liquid phase in which particles ranging in size from 10Å to 1000Å are dispersed. They are characterized by a large surface area and are therefore biphasic systems.
The fundamental macromolecules of life, proteins, nucleic acids etc., fall under the characteristics of the colloids, therefore in solution give rise to biphasic systems.
H. v. Helmholtz, in 1879, proposed that in the surface of separation between the two phases a double electric layer always forms. The phenomena that are observed due to the presence of the double electrical layer were called, at that time, electrokinetic effects. Since the theory proposed above is based on the properties of the double electrical layer on the surface of the macromolecules we can call this theory: the electrokinetic origin of the genetic code.
The electrokinetic origin of the genetic code postulates a chemical-physical interaction between the ꭤ-Helix amino acids and trinucleotide, and between amino acids and mRNA trinucleotides. It is a primitive translation mechanism that links the genetic code to the principles of physics and biology, and it is through these principles that the universality of the genetic code is explained.
Translated by: Sydney Isae Lukee