Storing Digital Data in DNA

Posted on Thursday, August 16th, 02012 by Laura Welcher
link Categories: Digital Dark Age, Long Term Science, Rosetta   chat 0 Comments

Schematic of DNA information storage.jpg

Reported in Science today, scientists George Church, Yuan Gao and Sriram Kosuri report that they have written a 5.27-megabit “book” in DNA – encoding far more digital data in DNA than has ever been achieved.

Writing messages in DNA was first demonstrated in 1988, and the largest amount of data written in DNA previously was 7,920 bits. The challenge in writing more information than this has been creating long perfect sequences. The current project uses shorter sequences, each encoding 96-bit data block, along with a 19-bit address that specifies the location of the data block within the larger data set. Then redundancy reduces errors: each base only encodes a single bit (A and C are both “0”, G and T are both “one”), and each data block has several molecular copies.

DNA has several advantages for archival data storage – information density, energy efficiency, and stability. With regard to stability DNA offers readability “despite degradation in non-ideal conditions over millennia” – by which they mean 400,000 years! (See Church and Regis, in their forthcoming book on the subject.)

If we wish to intentionally use this technology for active long-term information storage (imagine some crucial message we need to convey to the future), we should probably anticipate the possibility of a discontinuity in technological knowledge and access to tools that could read the information. This raises questions of discoverability, decodability, and readability.

Ubiquity aids discoverability – if the information is everywhere it is easier to find, even stumble upon, by accident. Still, clear signals / signposts could aid discovery (neon green cockroaches anyone?). With regard to decodability, I’ll simply mention there several layers of encoding to be unraveled here: spoken human language > written language in text form > digital / binary > DNA. And presumably readability requires tools on the order of at least what we have available today, unless you can make the expression of the information obvious in some biological way.

Wonderfully exciting new stuff to conjure with from the perspective of technologies for the Long Now Library. We are also delighted to be working with Dr. George Church to provide Rosetta / PanLex data that may be written in a new “edition” of the DNA book, so check back for updates!

  • It doesn’t seem any more transparent than storage on traditional mediums.

  • counterfeit

    Can this be used to successfully counterfeit money? Since everthing including access to your results cost $s, if you would kindly allow counterfeiting money, perhaps I could read what you wrote. lol

  • Josh

    So has anyone yet analyzed our “junk DNA” for such messages?

  • glaubrius

    Puts me in mind of Neil Stephenson’s “library grapes” that contained all the dna of the species in one genome

  • Excellent, I had just thought of suggesting this to you folks and find that you’ve also thought of it. Other advantages of DNA data storage is redundancy (i.e. a million bacteria coded with the same book and released into the wild preserves that book for the ages) and cheap replication. By releasing the organisms into the wild, you ensure their survival, and you just have to focus on preserving the knowledge for people to know how to read the knowledge thats in the wild.

navigateleft Previous Article

Next Article navigateright