October 2nd, 02013 by Jonathan Pool
The PanLex project of The Long Now Foundation, which is building a database of words and phrases in the world’s languages, has recently passed the one-billion-translation mark. That means there are now over a billion pairs of words or phrases, such as “clock” in English and “ঘড়ী” in Assamese, that PanLex records as attested translations of each other. The translations are derived from publications collected from around the world.
Beyond these billion attested translations, it is possible to infer others from longer paths of translations. For example, the number of pairs shoots up from 1 billion to 30 billion if we include translations at distance 2, namely translations of translations. The longer the path, the greater the number, and the lower the reliability, of translations.
Because counting up these totals would overload the PanLex servers, we have estimated them using a random sample of 3,000 words and phrases. The figures below show that as more words and phrases are added to the sample the estimates of distance 1 and distance 2 translations become more stable.