Blog Archive for the ‘Rosetta’ Category



Building an Audio Collection for All the World’s Languages

Published on Wednesday, July 21st, 02010 by Laine Stranahan

The Rosetta Project is pleased to announce the Parallel Speech Corpus Project, a year-long volunteer-based effort to collect parallel recordings in languages representing at least 95% of the world’s speakers. The resulting corpus will include audio recordings in hundreds of languages of the same set of texts, each accompanied by a transcription. This will provide a platform for creating new educational and preservation-oriented tools as well as technologies that may one day allow artificial systems to comprehend, translate, and generate them.

Huge text and speech corpora of varying degrees of structure already exist for many of the most widely spoken languages in the world—English is probably the most extensively documented, followed by other majority languages like Russian, Spanish, and Portuguese. Given some degree of access to these corpora (though many are not publicly accessible), research, education and preservation efforts in the ten languages which represent 50% of the world’s speakers (Mandarin, Spanish, English, Hindi, Urdu, Arabic, Bengali, Portuguese, Russian and Japanese) can be relatively well-resourced.

But what about the other half of the world? The next 290 most widely spoken languages account for another 45% of the population, and the remaining 6,500 or so are spoken by only 5%–this latter group representing the “long tail” of human languages:

Long_Tail_of_Languages.jpg

Equal documentation of all the world’s languages is an enormous challenge, especially in light of the tremendous quantity and diversity represented by the long tail. The Parallel Speech Corpus Project will take a first step toward universal documentation of all human languages, with the goal of providing documentation of the top 300 and providing a model that can then be extended out to the long tail. Eventually, researchers, educators and engineers alike should have access to every living human language, creating new opportunities for expanding knowledge and technology alike and helping to preserve our threatened diversity.

This project is made possible through the support and sponsorship of speech technology expert James Baker and will be developed in partnership with his ALLOW initiative. We will be putting out a call for volunteers soon. In the meantime, please contact rosetta@longnow.org with questions or suggestions.

Long Now at Wikimania 02010 in Gdansk Poland

Published on Tuesday, July 6th, 02010 by Danielle Engelman

Wikimania

Dr. Laura Welcher and Dr. Kurt Bollacker of Long Now will be speaking at this year’s Wikimania conference in Gdansk Poland over the weekend of July 9 – 11, 02010 on the creation of a new Language Commons Wiki.

Wikimania is a conference for users of the wiki projects operated by the Wikimedia Foundation. Topics of presentations and discussions include Wikimedia Foundation projects, other wikis, open source software, and free content.

Attendance is €15 per day, or €40 for all three days and you can register here.

If you have questions, you can contact Wikimania directly through this page.

Rosetta Spotlight: Ormuri – a piece of Middle Eastern identity

Published on Tuesday, May 11th, 02010 by Sarina Spector

Ormuri Description in the Rosetta Collection

Ormuri Description in the Rosetta Collection

“Language is identity,” Darfur refugee Daowd I. Salih told the New York Times about a week ago. He was being interviewed for an article called “Listening to (and Saving) the World’s Languages.” As mentioned in this Rosetta Project blog post, the article discusses the amazing variety of spoken languages in New York City, and what residents are doing (or not doing) to preserve their native language.

One of the languages the article touches on is Ormuri, a language of multiple dialects spoken in small regions of Afghanistan and Pakistan. According to the Ethnologue, Ormuri has only about 1,050 speakers. The New York Times article reveals a plan to canvass New York City for speakers of Ormuri in order to learn more about the language and the cultural information it holds.

Languages with small speaker populations are quickly dying out, and the data they contain (whether it be linguistic, historical, or cultural) is important enough to merit a concerted effort at saving them. Ormuri is a perfect example, especially in the political and economic environment of our time (read: the complex tangle that is our current Middle Eastern relations).  The Rosetta Project‘s database in the Internet Archive contains a detailed description of Ormuri, including a history of its speakers: where they came from, who their ancestors are, and how their language has co-evolved with those around it to become what it is today.

In my mind there is nothing that illustrates a culture’s unity so much as its language. It allows people to build social relationships, conduct business transactions, and express to fellow humans everything they hold dear. What’s more, as any good anthropologist knows, learning the language of a culture is one of the most important steps an outsider can take to gain the trust and respect of its people.

What does this have to do with an obscure Afghan language, or with Darfur refugees? Only this: if we intend to successfully navigate the conflicts of the modern global world, it is absolutely necessary to understand and relate to the people with whom we intend to work. The Middle East in particular, Afghanistan being an illustrative example, is culturally very foreign to the West; its people have lived for centuries in small, autonomous groups that hold to varied, often contradictory beliefs. The fact that so many of these groups have their own language, like Ormuri, is telling of their relative isolation, and gives clues to how they live their lives.

Rosetta’s description of Ormuri tells the story of its peoples’ interactions through Ormuri’s morphology. By studying the languages Ormuri had contact with and how these influenced its words, we can begin to create a web of social and economic interaction that would show the connections and dissociations between groups in the area. For example, Ormuri has many morphological similarities to Pashto, a common language in the region of Waziristan where Ormuri is spoken. Ormuri pronouns are strikingly similar to their Pashto equivalents, and many scattered words share similarities, like “wife,” “glitter,” and “to sit down.” Pashto has also phonetically influenced Ormuri, replacing some traditional Ormuri allophones with similar Pashto ones.

Ormuri has also sustained contact with Persian, which is evident in many morphological changes that mimic the latter: loss of gendered nouns, simplification of plural nouns, and reduction of irregular past participles.  Analyzing this data led the author, Georg Morgenstierne, to doubt the previous belief that Ormuri speakers descend from Kurds, and provided evidence for further theoretical investigations.

The very existence of this kind of knowledge is what Rosetta is all about; by preserving minority languages and stressing their importance, we hope to contribute vital insights into the lives of their speakers, insights that can be put to good use in surprising places. After all, you never know who you’ll meet on the New York City subway.

[A note of introduction: this is my first post as an intern with the Rosetta Project. I will be working with Rosetta for three months, building the collection in the Internet Archive and continuing to spotlight Rosetta material on this blog.]

The Global Lives Project

Published on Tuesday, March 2nd, 02010 by Laura Welcher

Last Friday evening, Long Now joined the Global Lives Project in celebrating their world premiere opening at San Francisco’s Yerba Buena Center for the Arts.  Through a huge volunteer effort, Global Lives has produced ten films – each 24 hours long – that visually capture the everyday life of ten people around the planet.  And on Friday we could view them all, at the same time, in the same room.  Ten huge screens hung from the ceiling of the Yerba Buena Forum and around a thousand people throughout the evening ambled around and under them, listening as voices emerged — Kai Lu, from Anren China speaking to his wife in a village dialect of Sichuan Yi, young Edith Kaphuka from Ngwale Village, Malawi code-switching with her friends on the playground between Chichewa and Chiyao, James Bullock of San Francisco chatting up the tourists on his cable car in West Coast American English.  Some screens showed people working, others playing, some eating, others sleeping — a glimpse into one human day on planet earth.

Global Lives Opening - Installation in the Forum

Global Lives Opening - Big Screen Installation in the YBCA Forum

A second ongoing installation in the YBCA Room for Big Ideas provides a more intimate viewing space, with ten partitioned rooms and LCD viewing screens.  Each room is furnished with seating for one or two, and with walls and floors embellished with fabrics, colors and textures evocative of the region of the film.  Kiosks and wall graphics give a bit of background about the project, and the ten participants.  And while the installation as a whole gives the sense of a finished, polished project, three computers set up prominently in the room tell a different – and quite wonderful – story.

Global Lives Project - Installation in YBCA Room for Big Ideas

Global Lives Project - Installation in YBCA Room for Big Ideas

This is not a finished project – in fact, it is very much a work in progress.  One of the greatest ongoing efforts is one that anyone can help with – the subtitling of each film in as many languages as possible (through the crowdsource subtitling site dotSUB).  The first pass was getting all ten films subtitled in English for the opening night, and that effort is still only about 80% done.  It is an enormous effort.  Jason Price, one of the producers of the Malawi shoot, tells the story of being nearly at wits end trying to find anyone to help translate Edith Kaphuka’s Chichewa into English — until someone suggested he set up a Facebook Group, and then 2,500 mostly expatriate Chichewa speakers arrived ready to help (there are, of course, many speakers of Chichewa in Malawi, but the need to access streaming video to do the translations made that nearly impossible).

Through the steadfast effort of about 25 of these people, the full twenty four hours of video has now not only been transcribed and translated, but put thorough about five stages of checking, rechecking and review to ensure its accuracy.  And, it is now the largest corpus of spoken transcribed Chichewa on the web.  (What might this ‘seed’ corpus enable down the road?  Chichewa online dictionaries?  Spell checkers?  Natural language processing?  Search? This group of translators may, without realizing it, be forging the way for a real Chichewa language online presence.)

For Global Lives, this set of ten videos is just the beginning of a much larger library of human life experience.  Not grand experiences, not Hollywood, not Bollywood — in the words of David Harris, the project’s director (responding to the umpteenth activist proposal, this one by yours truly) “we want boring!”  Because what we see as the everyday, the mundane, the routine is in fact a picture of our own humanity – and for that each Global Lives shoot is worth a thousand Hollywood productions.

The Global Lives installation in the Room for Big Ideas will be open through June 20, 02010 at San Francisco’s Yerba Buena Center for the Arts.  The Long Now Foundation sponsored the world premiere installation in the YBCA Forum through a grant from the William and Flora Hewlett Foundation.

3 Long Now Events in 8 Days

Published on Tuesday, February 23rd, 02010 by Alexander Rose - Twitter: @zander

Long Now has three events coming up over the next 8 days and we wanted to be sure you all had the right info for reserving tickets and making it out to all three.

  • Alan Weisman on “World Without Us, World With Us.” Wednesday February 24 (Thanks for coming this event went great)

Avoiding a Digital Dark Age

Published on Friday, February 19th, 02010 by Austin Brown

Long Now Digital Research Director Kurt Bollacker was recently published in New Scientist discussing the challenges in maintaining data for the long haul:

It seems unavoidable that most of the data in our future will be digital, so it behooves us to understand how to manage and preserve digital data so we can avoid what some have called the “digital dark age.” This is the idea—or fear!—that if we cannot learn to explicitly save our digital data, we will lose that data and, with it, the record that future generations might use to remember and understand us.

It’s a fairly long and comprehensive piece with lots of good advice and a good description of how the Rosetta Disk tries to address some of these problems.

Read the full article at New Scientist.

No More New Old Knowlege

Published on Thursday, February 18th, 02010 by Austin Brown

scroll

King’s College London president Rick Trainor announced recently that the university would be closing the chair of paleography, the UK’s only one.  Held by Professor David Ganz, the chair of paleography is the position that overseas a discipline many consider to be a vital component of historical research.  Paleography is the study of ancient manuscripts and has pieced together and deciphered many of the texts that have provided the basis for our knowledge of history.

Budget cuts are the precipitating factor, or rather “strategic disinvestment” as the official announcement goes, but they’re being met with some resistance.

“Palaeography is not simply an arcane auxiliary science,” says Professor Jeffrey Hamburger, chair of medieval studies at Harvard University. “It is as basic to the training and practice of ­historians as mastery of Dos or Unix might be to a computer scientist.”

-from the Guardian

Rosetta and Long Now on Life After People

Published on Thursday, February 4th, 02010 by Bryan Campen - Twitter: @cyrusbryan

rosettadiskectoplasm

Rosetta Project Director Laura Welcher recently took part in a segment on The History Channel’s Life After People series.

In an episode titled “Crypt of Civilization,” Laura discusses the Rosetta Disk and The 10,000 Year Clock.   

The central question of the series is “How long would it last?” The series explores various materials, systems and structures built by humans to determine their durability sans maintenance as well as natural systems and how they might flourish or decline without human intervention.

“Crypt of Civilization” focuses on time capsules, vaults and other attempts to create long-lasting caches of materials or data.  Laura explores some of the unique challenges in designing artifacts like the Disk and Clock to last thousands of years while the show’s producers vividly illustrate them.

You can watch the series on its website (though the “Crypt of Civilization” episode isn’t available yet).

Global Lives Project Opening Celebration

Published on Thursday, February 4th, 02010 by Austin Brown

israeldadahZhanna

Dedicated to bringing together video documentation of the daily lives of disparate global citizens, the Global Lives Project celebrates the opening of its first installation on February 26th at the Yerba Buena Center for the Arts in San Francisco.  This opening is sponsored in part by the Long Now Foundation through a grant from the William and Flora Hewlett Foundation.

The Global Lives Project’s World Premiere installation will be on view at the Yerba Buena Center for the Arts from February 26 – June 20, 2010! The exhibit is part of an artist residency that will evolve over four months. We will be showing, for the first time ever, our series of ten 24-hour videos of daily life from around the planet.

Join Global Lives, Long Now and the YBCA for the opening night celebration on February 26th from 7:30pm to 11:30pm.  There will be a cash bar and music from San Franciscans Kid Kameleon, Chief Boima, and Tinker.  Global Lives producers and directors will be there to discuss the project.

The event is free, but you’ll want to RSVP so you can be sure to get in!

Mumble in the Jungle

Published on Friday, December 11th, 02009 by Austin Brown

Campbells Monkey

This week, the New York Times ran an article about a recent scientific discovery in the predator alert calls of Campbell’s monkeys.   Strikingly, they seem to have the ability to create complex calls out of multiple elements – a “morphological” (word building) process previously thought to only take place in human language.

Human languages do this all the time – for example the word ‘walked’ is built of two morphemes, one carrying the main verbal action ‘walk’ and the other marking past tense ‘-ed’.  In the case of the Campbell’s monkey, morphemes are often combined to indicate different types of threats.  Previous observations of monkeys have shown that they sometimes use different types of calls for different types of predators, but what’s unique about these calls is that some of them can be combined with other calls to change their meaning.  So, instead of just having a “jaguar!” call and an “eagle!” call as has been observed in Vervet monkeys, Campbell’s monkeys have a “leopard!” call that can be combined with a suffix that changes its meaning to indicate a less specific threat:

Crucially, “krak” calls were exclusively given after detecting a leopard, suggesting that it functioned as a leopard alarm call, whereas the “krak-oo” was given to almost any disturbance, suggesting it functioned as a general alert call. Similarly, “hok” calls were almost exclusively associated with the presence of a crowned eagle (either a real eagle attack or in response to another monkey’s eagle alarm calls), while “hok-oo” calls were given to a range of disturbances within the canopy, including the presence of an eagle or a neighbouring group (whose presence could sometimes be inferred by the vocal behaviour of the females).

- Ouattara, Lemasson & Zuberbühler

Just as artificial intelligence researchers have been busy over the last several decades celebrating each previously-unique human capacity achieved by computers, biologists have been finding behaviors once thought to mark the uniqueness of humans in other animals.  Neurobiologist and primatologist Robert Sapolski recently gave a lecture at Stanford about the uniqueness of humans, which provides a great overview of what we share and don’t share with other animals (as is currently understood).

Similarly, primatologist Frans de Waal has made a career of describing the political, cultural, emotional and moral lives of primates.  His work has illustrated the evolutionary breadth and depth of many human characteristics previously thought to be recent behavioral innovations without precedent and unique to our species.

As artificial intelligence research looks forward to recreating human capabilities it focuses our efforts to understand those capabilities.  Similarly, in identifying in other animals capacities like syntax once thought to be unique to humans, we are afforded a clearer look back on the deep history and development of those capacities.  Looked at this way, it actually did take millions of years to produce the works of Shakespeare.

Looking for more blog articles?



The Long Now Blog

Ideas about Long-term Thinking.

 Subscribe in a reader

Categories

Archives

Meta

Some Rights Reserved (CC)

The Long Now Foundation
Fostering Long-term Responsibility
est. 01996.