Concept 31 Some DNA does not encode protein.

factoid Did you know ?

Repetitive DNA sequences like LINEs seem to have been present in the earliest known mammals and even before.

Hmmm...

Can you think of possible reasons why organisms have repetitive DNA?

I’m Roy Britten. In the '60s, David Kohne and I found that mouse cells contain multiple copies of very similar DNA sequences. We did this by looking at the reassociation rates of DNA strands. Let me explain. We already knew that DNA encodes proteins. Genetic information is stored in the sequence of nucleotides of a DNA strand. The nucleotides pair through hydrogen bonds — A with T, and G with C— to form two complementary strands. This is the DNA double helix. DNA can be extracted from prokaryotes and sheared into smaller fragments. When heated to near boiling, the hydrogen bonds between the complementary base pairs are disrupted, and the double-stranded DNA dissociates into single strands. When the temperature drops about 25º below the dissociation temperature, the DNA strands reassociate. Random collisions bring complementary strands back together, and the hydrogen bonds reform. Perfect matches don't always occur. The faster the temperature is lowered, the less time there is for complementary strands to "find" each other. There can be local areas of complementarity. We measured the DNA reassociation time by taking advantage of differences between double- and single-stranded DNA. Double-stranded DNA has a higher affinity for a crystalline form of calcium phosphate called hydroxyapatite. A column filled with hydroxyapatite traps double- stranded DNA but allows single-stranded DNA to flow through. Conditions can be set so that the column retains double-stranded DNA with some mismatches. All the DNA can be washed off the column, and both single and double-stranded DNA amounts can be measured. In 1962, reassociation reactions were done using eukaryotic DNA. Compared to bacteria like E. coli, eukaryotes like mice have about 100 times more DNA in their cells. Therefore, we were surprised that some eukaryotic DNA reassociated faster than E. coli DNA. We decided to analyze the data a bit further. We plotted the fraction of reassociated DNA against the log of the product of DNA concentration and time (C0t). We compared the reassociation rates between organisms with different genome sizes. For polyU and polyA strands, the C0t curve looked like this: polyU and polyA strands were synthesized in the lab, and used as a control. The C0t graph for E. coli DNA looked like this: Notice that the curve was displaced to the right. The reassociation reaction takes longer to complete because the E. coli DNA is far more complex. Unlike polyU and polyA strands, the DNA strands of E. coli take longer to find the right match. The reassociation curve of a portion of mouse DNA called satellite DNA looked like this: Notice how the mouse DNA reassociated faster than E. coli DNA. It turns out that mouse satellite DNA contains lots of repeated sequences. These sequences are so similar that they reassociate easily; there are no unique sequences that need to hunt for their partners. We found that an average eukaryotic genome had a reassociation curve that looked like this: The first part of the curve is the fast component and represents highly repetitive DNA that reassociates very quickly. Highly repetitive DNA can make up about 25% of the genome. The second part of the curve is the intermediate component where the middle/moderately repetitive DNA reassociates. This can represent about 30% of the DNA in the eukaryotic genome. The third part of the curve is the slow component; there is no repetitive DNA in this fraction. The slow component can make up to 45% of the DNA in the genome. We tested these DNA types to see which fraction coded for protein. We added radioactive mRNA as a tracer at the beginning of a reassociation reaction. As the temperature dropped, the mRNA hybridized with its template DNA. No mRNA hybridized to the highly repetitive DNA fraction. Very little of the radioactive mRNA hybridized with the mid-repetitive DNA fractions. Most of the radioactive mRNA hybridized with the slow DNA fraction — giving us a rough approximation of the fraction of a genome that encodes protein. The green curve represents the hybridization of the radioactive mRNA with DNA. If repetitive DNA doesn’t code for proteins, where did it come from and why is it there? Repetitive DNA probably arises from errors in DNA replication. The most highly repetitive DNA is usually found in regions near the centromeres and may have a function in chromatid pairing during cell division. Highly repetitive DNA is composed primarily of very short "tandem repeats" — numerous repeated units lined up head-to-tail, like the cars of a train. The repeated unit may be as short as two nucleotides ... ... to about 20 nucleotides. Moderately repetitive DNA is composed of larger elements scattered widely throughout the genome. Two major groups are categorized by size: Short Interspersed Elements (SINEs) are several hundred nucleotides in length, while Long Interspersed Elements (LINEs) are several thousand nucleotides long. Both groups are derived from transposons, so-called "jumping genes," which have accumulated over evolutionary time by moving to new chromosome locations. LINE elements jump using reverse transcriptase (RT), which functions in the same way but is not closely related to retrovirus RT. SINEs are elements that do not produce their own reverse transcriptase. They "borrow" reverse transcriptase from LINEs or other sources. Even though they use the same enzyme for insertion, LINEs and SINEs favor different insertions sites. Humans and other primates have about 500,000 copies of Alu, a 300-bp SINE. Alu elements alone are believed to make up about 5% of the human genome — an amount equal to the coding sequence!