You could say that exploring the frontier of data storage is in their genes.
Researchers working to store information in the form of DNA say they have developed a way to hold an MP3 file containing 26 seconds of Dr. Martin Luther King Jr.’s “I Have a Dream” speech, the text of all 154 of Shakespeare’s sonnets, Watson and Crick’s seminal paper on the molecular structure of nucleic acids and an image file on a piece of the material the size of a tiny dust particle.
Their breakthrough suggests that highly stable DNA-based information storage could become cost-effective within a decade for archives that could last tens of thousands of years.
More data needs better storage
As electronics become more powerful, the amount of data they generate is exploding. Currently, there is about three zettabytes’ worth of digital information in the world — 3,000 billion billion bytes — and archiving it all is a growing challenge.
One possible solution scientists have been exploring to store this data uses DNA, which nature already regularly uses to encode the blueprints of life. Furthermore, DNA can remain stable for millennia without requiring any difficult or costly storage conditions unlike, say, magnetic tape, which can degrade within a decade.
"We already know that DNA is a robust way to store information because we can extract it from bones of woolly mammoths, which date back tens of thousands of years, and make sense of it," says researcher Nick Goldman, a molecular and evolutionary biologist and a mathematician at the European Bioinformatics Institute (EBI) in Hinxton, England. "It’s also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy."
Now scientists have developed a strategy to encode a record-breaking amount of data, including images, text and audio files, in long strands of synthetic DNA. They estimate the organic molecule has an information storage density of roughly 2.2 petabytes per gram. For comparison, the brain’s memory storage capacity is estimated to be about 2.5 petabytes. At least 100 million hours of high-definition video could be stored in about a cup of DNA.
DNA is made of strings of molecules known as nucleotides. It uses four kinds of nucleotides — adenine, thymine, cytosine and guanine, abbreviated A, T, C and G. Just as distinct patterns of ink can represent different letters of the alphabet, distinct sequences of nucleotides can be used to encode data.
The researchers encoded computer files totaling 739 kilobytes on 153,335 strings of DNA they synthesized, each comprised of 117 nucleotides. These files included a color photo of the EBI building and a copy of the 1953 paper in which James Watson and Francis Crick first described the double-helix structure of DNA.
"We don’t want anyone to think that we are manipulating living organisms," Goldman cautions. "We are using DNA, which is what genomes are made of, but we’re not using it in a form that could affect any living thing."
The researchers chemically modified the DNA to help preserve it for long spans of time, and shipped it from the United States to Germany via the United Kingdom without specialized packaging to show how durable it is. Afterward, they retrieved the stored data with gene-sequencing machines that read each DNA strand. They then translated the digital information to reconstruct the original files.
"It’s amazing that the technologies for writing and reading DNA have developed so quickly in the past few years," Goldman says. "That’s what has made it possible for us to use DNA as a viable medium for information storage."
(EBI’s Nick Goldman looking at synthesized DNA. Credit: EMBL Photolab)
DNA data storage completely accurate, stable for millennia
Random errors in encoding do occur rarely — at the rate of about one per 500 nucleotides. To help tolerate such errors, the researchers created about 12 million copies of each DNA string. When recovering the data from DNA, the scientists compared many different copies of that information, and by figuring out what the data looked like on average, they could account for errors. All in all, stored information was retrieved with 100 percent accuracy.
Goldman and his colleagues note that scientists now routinely recover intact DNA fragments from fossils tens of thousands of years old, such as Neanderthal bones. As such, they say DNA may be an excellent way to archive large amounts of data for long spans of time with relatively little fuss, compared to other media like paper, which crumbles away over centuries unless carefully preserved. Keeping DNA requires only a cold, dry and dark environment to maintain the data encoded in it for millennia.
"As long as someone knows what the code is, you will be able to read it back if you have a machine that can read DNA," Goldman says.
Past research had developed ways to store information in DNA efficiently, but these methods were either not amenable to scaling-up or encoded only trivial amounts of data. The EBI researchers say this new advance not only encodes a record amount of information, but its error correction techniques also help make this strategy reliable and scalable for practical use.
The researchers estimate DNA information storage would currently cost $12,400 per megabyte and decoding it would cost $220 per megabyte. However, the price of synthesizing DNA is expected to drop about a hundred-fold or more in less than a decade, suggesting it could soon become a cost-effective means to archive data that isn’t accessed very often, such as old government and historical records.
"We are currently thinking of pursuing this in two directions," Goldman says. "One is to improve the coding-decoding algorithms, so we can store more information in less DNA and with better error correction. The other is to work on how to realize an actual DNA-based library. It’s as though we’ve just invented the book, and now need to work out how to make a reference library."
The scientists detailed their findings online Jan. 23 in the journal Nature.