Blog Archive for the ‘Rosetta’ Category

navigateleft Older Articles    Newer Articles navigateright

Rosetta and PanLex Projects at Exploratorium Market Days 10/19/13

Posted on Thursday, October 17th, 02013 by Austin Brown
link   Categories: Events, PanLex, Rosetta   chat 0 Comments

MarketDays

This Saturday October the 19th, Rosetta and PanLex Project staff will be at the Exploratorium’s final Market Days event of this year. The Exploratorium has been holding these free, outdoor events in the spirit of “exchanging fresh ideas on local phenomena.” Saturday’s theme is Heirlooms and Rosetta and PanLex will showcase our planet’s diverse linguistic stock.

Come to the Rosetta / PanLex Project booth where you can:

  • Learn about the thousands of languages spoken around the world, why many of them are endangered, and why this is important for everybody.
  • Learn how you can make and archive language recordings that document the languages used in your family, classroom and community.
  • Use the PanLex tattoo generator to make a temporary tattoo using words from thousands of languages around the world.
  • See a real Rosetta Disk – an archive of thousands of the world’s languages that read with a microscope, and can hold in the palm of your hand.

The event runs from 11:00am to 3:00pm at the Exploratorium’s new location at Pier 15.

PanLex hits a billion translations

Posted on Wednesday, October 2nd, 02013 by Jonathan Pool
link   Categories: Announcements, PanLex, Rosetta   chat 0 Comments

The PanLex project of The Long Now Foundation, which is building a database of words and phrases in the world’s languages, has recently passed the one-billion-translation mark. That means there are now over a billion pairs of words or phrases, such as “clock” in English and “ঘড়ী” in Assamese, that PanLex records as attested translations of each other. The translations are derived from publications collected from around the world.

Beyond these billion attested translations, it is possible to infer others from longer paths of translations. For example, the number of pairs shoots up from 1 billion to 30 billion if we include translations at distance 2, namely translations of translations.  The longer the path, the greater the number, and the lower the reliability, of translations.

Because counting up these totals would overload the PanLex servers, we have estimated them using a random sample of 3,000 words and phrases.  The figures below show that as more words and phrases are added to the sample the estimates of distance­ 1 and distance­ 2 translations become more stable.

distance1

 

 


distance2

 

The main goal of the PanLex database is to make it possible ultimately to translate any word or phrase in any language into any other language on Earth. With about 7,000 languages, and assuming an average of 100,000 words and phrases per language, there should eventually be about 2.5 trillion translation pairs available from PanLex. Project participants don’t hope to reach this total on their own. Instead, they plan to provide their data to researchers who will develop increasingly effective methods of automatically inferring unattested translations from networks of attested ones.

Forgotten Dictionaries of Indigenous Australian Languages Rediscovered

Posted on Friday, September 13th, 02013 by Austin Brown
link   Categories: Long Term Science, Rosetta   chat 0 Comments

Indigenous map 1940

Of the 145 indigenous languages spoken on the Australian continent, 110 are in danger of extinction, but a linguistics professor at the University of Sydney recently discovered a trove of documents that may help Australians better understand and preserve this diversity.

It started with just a pair of small notebooks from the 19th century. Michael Walsh stumbled across them in the New South Wales State Library and quickly realized they contained a handwritten dictionary of an Australian Indigenous language. The document was news to Walsh and lead him to dig deeper into the library’s archives. After two years’ worth of research, he came away with new (old) data on over 100 Australian languages, many no longer spoken. The find is a huge boon to the understanding of Australia’s linguistic history and diversity.

These documents, collected in part to harvest knowledge amid attempts to exert control on Australia’s Indigenous population, will now help to preserve that culture.

(The Guardian)

A New Dimension (or Two?) for Long-Term Data Storage

Posted on Friday, July 26th, 02013 by Charlotte Hajer
link   Categories: Digital Dark Age, Rosetta   chat 0 Comments

superman-optical-storage-glass2

A group of scientists at the University of Southampton is pushing the frontier of long-term data storage technology to a new level. At a recent Conference on Lasers and Electro-Optics in San José, the researchers announced their success at recording data in quartz glass by using a femtosecond laser.

A femtosecond, or ultrafast, laser sends out a quadrillion (that’s a 1 with 15 zeros) pulses per second. When focused on a piece of quartz glass, these photon bullets shift the structuring of atoms in the silica, creating what are called nanostructures. The presence of nanostructures changes the way light travels through the quartz, which means they can be ‘read’ by an optical microscope.

Taking advantage of this fact, these Southampton researchers figured out how to use an ultrafast laser to deliberately place nanostructured dots within the quartz glass. A configuration of dots can thereby become five-dimensional code, conveying meaning through its spatial position within the quartz (dimensions one, two, and three), as well as the size and directional orientation of the dot (dimensions four and five). Using this ‘code’, the research team successfully recorded a 300 kb digital text file into a piece of quartz glass, in the form of a holographic ‘image’ of dots that can be read with an optical microscope fitted with a polarizing filter.

Silica quartz is attractive as a base for very long-term storage because, like sapphire or nickel, it is strong and resistant to high temperatures up to 1000° Celsius. The Southampton research team claims that quartz glass could last for a million years:

“It is thrilling to think that we have created the first document which will likely survive the human race, said Peter Kazansky, professor of physical optoelectronics at the Univ. of Southampton’s Optical Research Centre. “This technology can secure the last evidence of civilization: all we’ve learnt will not be forgotten.”

Beyond the strength of its material, the potential of this new technology lies in the nano-scale of its encoding: at that order of magnitude (or microtude, if you will), the researchers suggest, a single piece of quartz could hold more than 350 terabytes of data. If this technology can be translated into a real-world utility, the researchers claim this new form of data storage

“… could be highly useful for organizations with big archives. At the moment companies have to back up their archives every five to ten years because hard-drive memory has a relatively short lifespan,” says [principal investigator Jingyu Zhang]. “Museums who want to preserve information or places like the national archives where they have huge numbers of documents, would really benefit.”

Memory of Mankind

Posted on Wednesday, May 29th, 02013 by Austin Brown
link   Categories: Long Term Thinking, Rosetta   chat 0 Comments

plakette

Among the photos on your walls, the art you’ve created, the things you’ve written or read – is there something you’d like to preserve for history? Something that you think deserves to be beheld by future generations, either for their edification or amusement?

An Austrian project is offering a means to accomplish this by way of burying clay tablets of your design into a salt mine. Memory of Mankind will etch words or photographs into clay. They produce two copies of each tablet. You get one and the other is buried inside Halstatt’s salt mine. If you’d like, your design can also be shared in the online version of the archive.

Almost half of the world’s languages are endangered

Posted on Wednesday, April 17th, 02013 by Austin Brown
link   Categories: Rosetta   chat 0 Comments

endlang

On the blog of Long Now’s Rosetta Project, intern Karin Wiecha describes the recently published findings of a major linguistics research effort:

ELCat uses the metaphor of biodiversity to illustrate the gravity of the loss of an entire language family: If we compare the extinction of a language to the extinction of an animal species, the death of a language family would equal the loss of a whole branch of the animal kingdom, for example all felines.[4] We know of a hundred language families that have gone extinct over the course of history – 24% of the world’s linguistic diversity. But the fact that 28 of them have gone extinct over the relatively short time span of the last 50 years is symptomatic of the accelerated rate of language loss we are experiencing in recent times.

The Endangered Languages Catalogue (ELCat) “aims to compile a comprehensive up-to-date catalogue on all languages considered to be in danger.” At the 3rd International Conference on Language Documentation & Conservation (ICLDC 3) earlier in 02013, ELCat’s director presented several years’ worth of research, including the above facts.

Beyond those that we’ve already lost, ELCat has found that 457 languages – almost 10% of humanity’s living languages – are spoken by just 10 or fewer people and are on the brink of extinction, while 3,176 – almost half – are endangered. In fact, ELCat puts the current rate of linguistic extinction at about one language every three months.

Learn more, including what people are doing about it, on the Rosetta Blog.

Encapsulated Universes

Posted on Thursday, February 28th, 02013 by Charlotte Hajer
link   Categories: Rosetta   chat 0 Comments

In a recent conversation with Edge, Stanford Psychologist and former SALT speaker Lera Boroditsky explores intriguing – and still controversial – questions about the relationship between the language we speak, and the way we think about the world.

Weaving her thoughts together with examples from a variety of different languages, Boroditsky shows us that languages differ in the kind of contextual information they prioritize. Hebrew assigns everything in the world a gender, whereas Finnish does not. Russian verbs specify when an event took place, while Indonesian verbs are timeless. And where English sentences can be vague about the causality of an event, Japanese tends to be much more explicit about who did what. In other words, language shapes the things we notice about our environment.

“Think about it this way. We have 7,000 languages. Each of these languages encompasses a world-view, encompasses the ideas and predispositions and cognitive tools developed by thousands of years of people in that culture. Each one of those languages offers a whole encapsulated universe. So we have 7,000 parallel universes, some of them are quite similar to one another, and others are a lot more different.”

This does not mean that language dictates what we do and do not experience about our world – speakers of Finnish are still able to recognize the difference between men and women, and Indonesians know whether something happened in the past or the present. But it does mean that language is more than simply a way to convey meaning. In prioritizing certain pieces of information over others, language adds a certain color to our universe. In other words, meaning emerges from the fabric of language itself:

“Those interconnections between words are not simply the webbing on top of an otherwise pure logical knowledge system. Rather, in fact, meaning exists in the way that we use words; the patterns of word use create the system of meaning. There’s no getting away from language in getting to complex meanings.”

Exploring a new language, then, is truly a way to explore new worlds – and to celebrate the “flexibility and the ingenuity of the human mind.”

“The fact that we’re able to take so many different perspectives and create such an incredibly diverse set of ways of looking at the world, that is something first to be celebrated, but also something to learn from: flexibility and diversity are at the very heart of what makes us human and what makes us so smart. I think the more we understand how people are able to take all these different perspectives, and able to change the way they think, the better we’ll understand the nature of being human.”

Happy International Mother Language Day 02013!

Posted on Thursday, February 21st, 02013 by Austin Brown
link   Categories: Rosetta   chat 0 Comments

On The Rosetta Project Blog, intern Karin Wiecha writes:

Today mother tongues will be celebrated world wide. This date was chosen by UNESCO in recognition of the Bengali language movement, where on February 21, 01952, students protested for their language to become an official national language. Several protesters taking part in the demonstration were killed by police. The celebration of International Mother Language Day reminds us of the importance of linguistic diversity and the human right to use one’s mother tongue, no matter how few speakers it might have, to be preserved and passed on to future generations.

The theme of this year’s International Mother Language Day is Books for Mother Tongue Education. This theme highlights the importance of mother tongue education for the survival of linguistic diversity. For a large number of languages there are no books or teaching materials. But with a majority language being the language of instruction at school, children of minority language speech communities have little chance to become literate in their mother tongue. Also many young speakers are prone to switch to a globally more dominant language when they realize that the use of their mother tongue does not allow them to take part in all walks of modern life. Mother tongue education is an important step towards preserving the world’s language diversity for the future.

Today and in the coming days people all over the globe are celebrating this diversity in a variety of events. Do you want to help raise awareness of the importance of linguistic diversity? You could help The Long Now Foundation’s PanLex Project translate “mother tongue” in as many languages as possible. You could also print the official International Mother Language Day 02013 poster and hang it at school or work. For more ideas on how to get involved, visit the UNESCO’s website.

Decoding Long-Term Data Storage

Posted on Friday, October 12th, 02012 by Charlotte Hajer
link   Categories: Digital Dark Age, Rosetta, Technology   chat 0 Comments

If human societies are founded on the accumulation of knowledge through the ages, then the long-term transmission of information must be the cornerstone of a durable civilization. And as we accelerate ever more rapidly in our expansion of knowledge and technological capability, the development of durable storage methods becomes ever more important.

In the process of brainstorming such methods, two central questions emerge. The first of these concerns the type of storage media you might use: what kind of material is likely to last long enough to convey a message to generations thousands of years into the future? Throughout much of history, people carved important messages into stone, bone, or other hard materials. So far, we don’t seem to have come up with anything better: most of us are familiar with the limited lifespan of CDs, vinyl, and computer hard drives. Faced with this lack of suitable options, several organizations and companies around the world have re-embraced the long-term durability of hard natural substances. The Long Now’s Rosetta disk, for example, is made of nickel. Arnano, a French technology start-up, has developed a disk of sapphire on which to micro-etch information – civic records, perhaps, or important messages about the storage of nuclear waste. And most recently, Japanese electronics giant Hitachi announced a new data storage technology that uses quartz glass.

The second – and perhaps even more intriguing – question concerns the language of your message. What kind of ‘code’ will be most easily accessible to future generations, and what technologies will they have available to help them decrypt a message from the past?

The storage and transmission of data often requires multiple levels of encoding. When we think of ‘code’ we often think of computers – but in fact, we routinely go through two layers of encryption before we can even begin to digitize information. Spoken human language is itself a code, in which sounds are used to signify things or ideas. The use of a writing system adds a further layer of encryption: sequences of letters or pictographs signify the sounds that represent things or ideas. Yet another layer of encryption can then be applied by translating a writing system into binary numbers (and numeric systems are a kind of code, as well!) or perhaps even DNA.

These extra layers of encoding offer the advantage of information density: they can help you pack lots of information into a very small format. However, each layer also further complicates the decodability and readability of a message. Because the Rosetta Disk is itself intended to be a tool for decryption – a primer of human language meant to help future archaeologists unlock entire worlds of culture, just like the Rosetta stone did in the 19th century – Long Now has chosen to store its data in the analog form of human alphabets, rather than add an extra layer of encryption by a digital code of 1’s and 0’s.

Arnano, the makers of the sapphire disk, have made a similar choice. The added advantage is that this analog information is readable by the human eye (aided by a microscope or magnifying glass).

It’s safe to assume that the languages future generations will speak – and the technologies they’ll have available – will most likely be very different from what we use today. This brings up an important third question: how do you include ‘instructions’ for decoding and reading with your message? Following the example of its namesake predecessor, the content on Long Now’s Rosetta Disk is its own primer: if you know at least one of the 1,500 languages included on the disk, all other information can be decoded. Perhaps a similar kind of parallel multiplicity of codes is possible for other storage methods as well.

These questions of language and code are inevitably more difficult to answer than that of the storage medium. You can subject your chosen material to stress tests to make sure that it will stand up to acid, erosion, or any other kind of potential natural disaster. But there’s no similar test for language; it’s impossible to predict what codes will be interpretable by the people of the future, or what technology they’ll have available to decrypt a message. Nevertheless, these conundrums are no less important to grapple with, and any proposal for long-term storage worth its salt must offer some potential answers to these questions.

Storing Digital Data in DNA

Posted on Thursday, August 16th, 02012 by Laura Welcher
link   Categories: Digital Dark Age, Long Term Science, Rosetta   chat 0 Comments

Schematic of DNA information storage.jpg

Reported in Science today, scientists George Church, Yuan Gao and Sriram Kosuri report that they have written a 5.27-megabit “book” in DNA – encoding far more digital data in DNA than has ever been achieved.

Writing messages in DNA was first demonstrated in 1988, and the largest amount of data written in DNA previously was 7,920 bits. The challenge in writing more information than this has been creating long perfect sequences. The current project uses shorter sequences, each encoding 96-bit data block, along with a 19-bit address that specifies the location of the data block within the larger data set. Then redundancy reduces errors: each base only encodes a single bit (A and C are both “0”, G and T are both “one”), and each data block has several molecular copies.

DNA has several advantages for archival data storage – information density, energy efficiency, and stability. With regard to stability DNA offers readability “despite degradation in non-ideal conditions over millennia” – by which they mean 400,000 years! (See Church and Regis, in their forthcoming book on the subject.)

If we wish to intentionally use this technology for active long-term information storage (imagine some crucial message we need to convey to the future), we should probably anticipate the possibility of a discontinuity in technological knowledge and access to tools that could read the information. This raises questions of discoverability, decodability, and readability.

Ubiquity aids discoverability – if the information is everywhere it is easier to find, even stumble upon, by accident. Still, clear signals / signposts could aid discovery (neon green cockroaches anyone?). With regard to decodability, I’ll simply mention there several layers of encoding to be unraveled here: spoken human language > written language in text form > digital / binary > DNA. And presumably readability requires tools on the order of at least what we have available today, unless you can make the expression of the information obvious in some biological way.

Wonderfully exciting new stuff to conjure with from the perspective of technologies for the Long Now Library. We are also delighted to be working with Dr. George Church to provide Rosetta / PanLex data that may be written in a new “edition” of the DNA book, so check back for updates!