Blog Archive for the ‘Rosetta’ Category



Almost half of the world’s languages are endangered

Published on Wednesday, April 17th, 02013 by Austin Brown

endlang

On the blog of Long Now’s Rosetta Project, intern Karin Wiecha describes the recently published findings of a major linguistics research effort:

ELCat uses the metaphor of biodiversity to illustrate the gravity of the loss of an entire language family: If we compare the extinction of a language to the extinction of an animal species, the death of a language family would equal the loss of a whole branch of the animal kingdom, for example all felines.[4] We know of a hundred language families that have gone extinct over the course of history – 24% of the world’s linguistic diversity. But the fact that 28 of them have gone extinct over the relatively short time span of the last 50 years is symptomatic of the accelerated rate of language loss we are experiencing in recent times.

The Endangered Languages Catalogue (ELCat) “aims to compile a comprehensive up-to-date catalogue on all languages considered to be in danger.” At the 3rd International Conference on Language Documentation & Conservation (ICLDC 3) earlier in 02013, ELCat’s director presented several years’ worth of research, including the above facts.

Beyond those that we’ve already lost, ELCat has found that 457 languages – almost 10% of humanity’s living languages – are spoken by just 10 or fewer people and are on the brink of extinction, while 3,176 – almost half – are endangered. In fact, ELCat puts the current rate of linguistic extinction at about one language every three months.

Learn more, including what people are doing about it, on the Rosetta Blog.

Encapsulated Universes

Published on Thursday, February 28th, 02013 by Charlotte

In a recent conversation with Edge, Stanford Psychologist and former SALT speaker Lera Boroditsky explores intriguing – and still controversial – questions about the relationship between the language we speak, and the way we think about the world.

Weaving her thoughts together with examples from a variety of different languages, Boroditsky shows us that languages differ in the kind of contextual information they prioritize. Hebrew assigns everything in the world a gender, whereas Finnish does not. Russian verbs specify when an event took place, while Indonesian verbs are timeless. And where English sentences can be vague about the causality of an event, Japanese tends to be much more explicit about who did what. In other words, language shapes the things we notice about our environment.

“Think about it this way. We have 7,000 languages. Each of these languages encompasses a world-view, encompasses the ideas and predispositions and cognitive tools developed by thousands of years of people in that culture. Each one of those languages offers a whole encapsulated universe. So we have 7,000 parallel universes, some of them are quite similar to one another, and others are a lot more different.”

This does not mean that language dictates what we do and do not experience about our world – speakers of Finnish are still able to recognize the difference between men and women, and Indonesians know whether something happened in the past or the present. But it does mean that language is more than simply a way to convey meaning. In prioritizing certain pieces of information over others, language adds a certain color to our universe. In other words, meaning emerges from the fabric of language itself:

“Those interconnections between words are not simply the webbing on top of an otherwise pure logical knowledge system. Rather, in fact, meaning exists in the way that we use words; the patterns of word use create the system of meaning. There’s no getting away from language in getting to complex meanings.”

Exploring a new language, then, is truly a way to explore new worlds – and to celebrate the “flexibility and the ingenuity of the human mind.”

“The fact that we’re able to take so many different perspectives and create such an incredibly diverse set of ways of looking at the world, that is something first to be celebrated, but also something to learn from: flexibility and diversity are at the very heart of what makes us human and what makes us so smart. I think the more we understand how people are able to take all these different perspectives, and able to change the way they think, the better we’ll understand the nature of being human.”

Happy International Mother Language Day 02013!

Published on Thursday, February 21st, 02013 by Austin Brown

On The Rosetta Project Blog, intern Karin Wiecha writes:

Today mother tongues will be celebrated world wide. This date was chosen by UNESCO in recognition of the Bengali language movement, where on February 21, 01952, students protested for their language to become an official national language. Several protesters taking part in the demonstration were killed by police. The celebration of International Mother Language Day reminds us of the importance of linguistic diversity and the human right to use one’s mother tongue, no matter how few speakers it might have, to be preserved and passed on to future generations.

The theme of this year’s International Mother Language Day is Books for Mother Tongue Education. This theme highlights the importance of mother tongue education for the survival of linguistic diversity. For a large number of languages there are no books or teaching materials. But with a majority language being the language of instruction at school, children of minority language speech communities have little chance to become literate in their mother tongue. Also many young speakers are prone to switch to a globally more dominant language when they realize that the use of their mother tongue does not allow them to take part in all walks of modern life. Mother tongue education is an important step towards preserving the world’s language diversity for the future.

Today and in the coming days people all over the globe are celebrating this diversity in a variety of events. Do you want to help raise awareness of the importance of linguistic diversity? You could help The Long Now Foundation’s PanLex Project translate “mother tongue” in as many languages as possible. You could also print the official International Mother Language Day 02013 poster and hang it at school or work. For more ideas on how to get involved, visit the UNESCO’s website.

Decoding Long-Term Data Storage

Published on Friday, October 12th, 02012 by Charlotte

If human societies are founded on the accumulation of knowledge through the ages, then the long-term transmission of information must be the cornerstone of a durable civilization. And as we accelerate ever more rapidly in our expansion of knowledge and technological capability, the development of durable storage methods becomes ever more important.

In the process of brainstorming such methods, two central questions emerge. The first of these concerns the type of storage media you might use: what kind of material is likely to last long enough to convey a message to generations thousands of years into the future? Throughout much of history, people carved important messages into stone, bone, or other hard materials. So far, we don’t seem to have come up with anything better: most of us are familiar with the limited lifespan of CDs, vinyl, and computer hard drives. Faced with this lack of suitable options, several organizations and companies around the world have re-embraced the long-term durability of hard natural substances. The Long Now’s Rosetta disk, for example, is made of nickel. Arnano, a French technology start-up, has developed a disk of sapphire on which to micro-etch information – civic records, perhaps, or important messages about the storage of nuclear waste. And most recently, Japanese electronics giant Hitachi announced a new data storage technology that uses quartz glass.

The second – and perhaps even more intriguing – question concerns the language of your message. What kind of ‘code’ will be most easily accessible to future generations, and what technologies will they have available to help them decrypt a message from the past?

The storage and transmission of data often requires multiple levels of encoding. When we think of ‘code’ we often think of computers – but in fact, we routinely go through two layers of encryption before we can even begin to digitize information. Spoken human language is itself a code, in which sounds are used to signify things or ideas. The use of a writing system adds a further layer of encryption: sequences of letters or pictographs signify the sounds that represent things or ideas. Yet another layer of encryption can then be applied by translating a writing system into binary numbers (and numeric systems are a kind of code, as well!) or perhaps even DNA.

These extra layers of encoding offer the advantage of information density: they can help you pack lots of information into a very small format. However, each layer also further complicates the decodability and readability of a message. Because the Rosetta Disk is itself intended to be a tool for decryption – a primer of human language meant to help future archaeologists unlock entire worlds of culture, just like the Rosetta stone did in the 19th century – Long Now has chosen to store its data in the analog form of human alphabets, rather than add an extra layer of encryption by a digital code of 1’s and 0’s.

Arnano, the makers of the sapphire disk, have made a similar choice. The added advantage is that this analog information is readable by the human eye (aided by a microscope or magnifying glass).

It’s safe to assume that the languages future generations will speak – and the technologies they’ll have available – will most likely be very different from what we use today. This brings up an important third question: how do you include ‘instructions’ for decoding and reading with your message? Following the example of its namesake predecessor, the content on Long Now’s Rosetta Disk is its own primer: if you know at least one of the 1,500 languages included on the disk, all other information can be decoded. Perhaps a similar kind of parallel multiplicity of codes is possible for other storage methods as well.

These questions of language and code are inevitably more difficult to answer than that of the storage medium. You can subject your chosen material to stress tests to make sure that it will stand up to acid, erosion, or any other kind of potential natural disaster. But there’s no similar test for language; it’s impossible to predict what codes will be interpretable by the people of the future, or what technology they’ll have available to decrypt a message. Nevertheless, these conundrums are no less important to grapple with, and any proposal for long-term storage worth its salt must offer some potential answers to these questions.

Storing Digital Data in DNA

Published on Thursday, August 16th, 02012 by Laura Welcher

Schematic of DNA information storage.jpg

Reported in Science today, scientists George Church, Yuan Gao and Sriram Kosuri report that they have written a 5.27-megabit “book” in DNA – encoding far more digital data in DNA than has ever been achieved.

Writing messages in DNA was first demonstrated in 1988, and the largest amount of data written in DNA previously was 7,920 bits. The challenge in writing more information than this has been creating long perfect sequences. The current project uses shorter sequences, each encoding 96-bit data block, along with a 19-bit address that specifies the location of the data block within the larger data set. Then redundancy reduces errors: each base only encodes a single bit (A and C are both “0”, G and T are both “one”), and each data block has several molecular copies.

DNA has several advantages for archival data storage – information density, energy efficiency, and stability. With regard to stability DNA offers readability “despite degradation in non-ideal conditions over millennia” – by which they mean 400,000 years! (See Church and Regis, in their forthcoming book on the subject.)

If we wish to intentionally use this technology for active long-term information storage (imagine some crucial message we need to convey to the future), we should probably anticipate the possibility of a discontinuity in technological knowledge and access to tools that could read the information. This raises questions of discoverability, decodability, and readability.

Ubiquity aids discoverability – if the information is everywhere it is easier to find, even stumble upon, by accident. Still, clear signals / signposts could aid discovery (neon green cockroaches anyone?). With regard to decodability, I’ll simply mention there several layers of encoding to be unraveled here: spoken human language > written language in text form > digital / binary > DNA. And presumably readability requires tools on the order of at least what we have available today, unless you can make the expression of the information obvious in some biological way.

Wonderfully exciting new stuff to conjure with from the perspective of technologies for the Long Now Library. We are also delighted to be working with Dr. George Church to provide Rosetta / PanLex data that may be written in a new “edition” of the DNA book, so check back for updates!

The Apollo Goodwill Disc

Published on Thursday, August 9th, 02012 by Alex Mensing

On July 20, 01969, humans landed on the surface of the moon for the first time. But since only two of us got to go, NASA sent a message “FROM PLANET EARTH” in the rest of humanity’s stead. The message wasn’t a letter written in ink and paper, though. It was a thin silicon disc, with messages from various world leaders etched into its surface at a microscopic scale.  On the recent anniversary of the Apollo 11 landing, Steve Jurvetson posted photographs of some Apollo 11 artifacts, including the Goodwill Disc. Jurvetson writes on his Flickr page:

The story of the rushed creation of the disc is fascinating, as are the messages embedded in this interplanetary time capsule.

The concept started in June, 1969, and it was a politically charged project, in the midst of the Cold War and the Vietnam War. On June 27, NASA telephoned the state department, and got the unprecedented permission to contact the foreign chiefs of state to deposit a message on the moon. This was 19 days before launch. They were asked to compose and send typed and scribed letters to the U.S. (they came by telegram and mail).

But NASA did not know how they would store the messages so that they could last thousands of years in the harsh temperatures, solar radiation, and cosmic rays on the lunar surface. So they approached the supplier of some of the most advanced technology on Apollo – the nascent semiconductor industry.

Sprague manufactured 53,000 components on the Apollo 11 spacecraft and many more for the ground support equipment. The engineers chose silicon for the storage medium because of the density of storage and the stability of silicon over temperature in a vacuum.

You can read the text of the goodwill messages on Wikipedia, as well as on the original 01969 NASA description, which also explains a bit about the fabrication method.

Forty years later, The Long Now Foundation’s Rosetta Disk uses remarkably similar technology to provide a durable record of the world’s human languages:

For the extreme longevity version of the Rosetta database, we have selected a new high density analog storage device as an alternative to the quick obsolescence and fast material decay rate of typical digital storage systems. This technology, developed by Los Alamos Laboratories and Norsam Technologies, can be thought of as a kind of next generation microfiche. However, as an analog storage system, it is far superior. A 2.8 inch diameter nickel disk can be etched at densities of 200,000 page images per disk, and the result is immune to water damage, able to withstand high temperatures, and unaffected by electromagnetic radiation. This makes it an ideal backup for a long-term text image archive. Also, since the encoding is a physical image (no 1′s or 0′s), there is no platform or format dependency, guaranteeing readability despite changes in digital operating systems, applications, and compression algorithms.

 

(via BoingBoing)

Rosetta, A Documentary by Scott Oller

Published on Tuesday, July 24th, 02012 by Austin Brown

The Rosetta Project was created to begin the work of filling Long Now’s 10,000 Year Library and in 02011 student filmmaker Scott Oller offered to help tell the story of the project’s aspirations and achievements. This short documentary, Oller’s senior thesis, was shot over the course of several weeks in the Spring of 02012 and explores the contents of the Rosetta Project’s collection of linguistic data, the Internet Archive’s role in hosting and making accessible that data, and the aesthetics and functionality of the Rosetta Disk itself.

Bringing the World’s ~ 7,000 Languages Online

Published on Tuesday, July 3rd, 02012 by Austin Brown

On July 9, Rosetta Project director Laura Welcher will be giving a talk in the Long Now museum on “Bringing the World’s ~ 7,000 Languages Online.” This talk is part of an ongoing series offered by SF Globalization, a San Francisco meetup group interested in software localization and internationalization.

“There are nearly 7,000 languages spoken in the world today, but the vast majority of them are contracting dramatically in use, rapidly approaching obsolescence and extinction. While computers, mobile devices and the Internet could offer an entirely new domain of language use – infusing these languages with modern vitality and vigor – there are few languages that can be used with ease in this domain today. In this talk, Dr. Laura Welcher will present the work of The Rosetta Project that she directs at The Long Now Foundation, their efforts to build resources and capacity for all human languages, and what it takes to bring these languages online.”

Find all the details and RSVP to attend on the Meetup Page.

Endangered Languages Project launches

Published on Wednesday, June 20th, 02012 by Laura Welcher

The Rosetta Project and PanLex Project at The Long Now Foundation are excited to announce that we are participating in a new initiative called the Endangered Languages Project, which is backed by the Alliance for Linguistic Diversity.

As member organization of the Alliance, we will be providing support for the Project, which aims to:

- accelerate, strengthen, and catalyze efforts around endangered language documentation,
- support communities engaged in protecting or revitalizing their languages, and
- raise awareness about ways to address threats to endangered languages.

Through the Endangered Languages Project, endangered language communities and scholars are able to contribute their own materials by uploading language documentation via Google tools such as Google Docs and YouTube. Alliance members will help maintain the project as an open space so that any user can find, share, and discuss the most comprehensive and up to date information and primary data on endangered languages.

As part of our contribution, the PanLex project has offered to make accessible its compilation of a half-billion pairwise translations among 17 million lexemes in 6,000 languages. Our hope is that this data can be made available through the Endangered Languages Project to promote collaboration with researchers and enable more than a trillion additional inferred lexical translations.

Rosetta Project featured on Radio National Australia’s “Lingua Franca”

Published on Friday, March 30th, 02012 by Laura Welcher

The Rosetta Project was just featured in the radio show “Lingua Franca,” presented by Maria Zijlstra and broadcast on ABC Radio National Australia. The full program is available here as a podcast on the Lingua Franca website.

Radio National Lingua Franca
 

 

Looking for more blog articles?



Some Rights Reserved (CC)

The Long Now Foundation - Fostering Long-term Responsibility - est. 01996.