Support Long-term Thinking
Support Long-term Thinking

Decoding Long-Term Data Storage

by Charlotte Hajer on October 12th, 02012

If human societies are founded on the accumulation of knowledge through the ages, then the long-term transmission of information must be the cornerstone of a durable civilization. And as we accelerate ever more rapidly in our expansion of knowledge and technological capability, the development of durable storage methods becomes ever more important.

In the process of brainstorming such methods, two central questions emerge. The first of these concerns the type of storage media you might use: what kind of material is likely to last long enough to convey a message to generations thousands of years into the future? Throughout much of history, people carved important messages into stone, bone, or other hard materials. So far, we don’t seem to have come up with anything better: most of us are familiar with the limited lifespan of CDs, vinyl, and computer hard drives. Faced with this lack of suitable options, several organizations and companies around the world have re-embraced the long-term durability of hard natural substances. The Long Now’s Rosetta disk, for example, is made of nickel. Arnano, a French technology start-up, has developed a disk of sapphire on which to micro-etch information – civic records, perhaps, or important messages about the storage of nuclear waste. And most recently, Japanese electronics giant Hitachi announced a new data storage technology that uses quartz glass.

The second – and perhaps even more intriguing – question concerns the language of your message. What kind of ‘code’ will be most easily accessible to future generations, and what technologies will they have available to help them decrypt a message from the past?

The storage and transmission of data often requires multiple levels of encoding. When we think of ‘code’ we often think of computers – but in fact, we routinely go through two layers of encryption before we can even begin to digitize information. Spoken human language is itself a code, in which sounds are used to signify things or ideas. The use of a writing system adds a further layer of encryption: sequences of letters or pictographs signify the sounds that represent things or ideas. Yet another layer of encryption can then be applied by translating a writing system into binary numbers (and numeric systems are a kind of code, as well!) or perhaps even DNA.

These extra layers of encoding offer the advantage of information density: they can help you pack lots of information into a very small format. However, each layer also further complicates the decodability and readability of a message. Because the Rosetta Disk is itself intended to be a tool for decryption – a primer of human language meant to help future archaeologists unlock entire worlds of culture, just like the Rosetta stone did in the 19th century – Long Now has chosen to store its data in the analog form of human alphabets, rather than add an extra layer of encryption by a digital code of 1’s and 0’s.

Arnano, the makers of the sapphire disk, have made a similar choice. The added advantage is that this analog information is readable by the human eye (aided by a microscope or magnifying glass).

It’s safe to assume that the languages future generations will speak – and the technologies they’ll have available – will most likely be very different from what we use today. This brings up an important third question: how do you include ‘instructions’ for decoding and reading with your message? Following the example of its namesake predecessor, the content on Long Now’s Rosetta Disk is its own primer: if you know at least one of the 1,500 languages included on the disk, all other information can be decoded. Perhaps a similar kind of parallel multiplicity of codes is possible for other storage methods as well.

These questions of language and code are inevitably more difficult to answer than that of the storage medium. You can subject your chosen material to stress tests to make sure that it will stand up to acid, erosion, or any other kind of potential natural disaster. But there’s no similar test for language; it’s impossible to predict what codes will be interpretable by the people of the future, or what technology they’ll have available to decrypt a message. Nevertheless, these conundrums are no less important to grapple with, and any proposal for long-term storage worth its salt must offer some potential answers to these questions.