Presentism in Google Books

Posted on Tuesday, January 4th, 02011 by Austin Brown
link Categories: Digital Dark Age, Rosetta, Technology   chat 0 Comments

Google’s new Ngram Viewer is a graphical interface for looking at the frequency of words over time in the several million books scanned into their database.  As a publicly mine-able data set, it’s huge and ripe for exploration with 500 years’ worth of published books spanning several languages.  And while it may seem a simple ‘just so’ kind of information to be able to call up how often a word was used in a particular year, the lives of words can often illuminate historical and cultural trends in surprising ways.

A paper published by researchers who helped develop the project (and summarized by Discover) rounded up a few interesting findings.  One delectably recursive tidbit they mentioned was that a search for years (ie. 1865, 1990) can show the historical efforts focused on particular eras and the extent to which those years remain part of present day discussion.

They found a general trend each individual year follows: a spike just before the year followed by a downward trending long tail as it recedes into history.  They also, however, noticed a trend amongst that pattern: higher peaks with shorter tails.

When the team looked at the frequency of individual years, they found a consistent pattern. In their own words: “’1951’ was rarely discussed until the years immediately preceding 1951. Its frequency soared in 1951, remained high for three years, and then underwent a rapid decay, dropping by half over the next fifteen years.” But the shape of these graphs is changing. The peak gets higher with every year and we are forgetting our past with greater speed. The half-life of ‘1880’ was 32 years, but that of ‘1973’ was a mere 10 years.

So, at a cultural level, we can see a developing ‘presentism’ in which the year we’re currently inhabiting takes on great significance, but is more quickly forgotten once it’s passed.

  • Kelly Wm. Hoskins

    This is beyond interesting. I believe this pattern is similar to the memory of the individual person. As we age we accumulate more and more information.The more information a person has, the greater the relative significance of recent events and the faster those events recede into insignificance. However, the individual's memory is strongly weighted by emotional factors; strong emotional events will cause associated information to recede much, much more slowly. Post traumatic stress disorder is the inability for memories to recede; the person is stuck reliving them over and over.
    I expect that the collective memory will work the same way. I predict that years containing strong emotional events will not follow the general pattern. In fact I expect them to blip every ten years (like a heartbeat on an EKG).
    Seriously, this is fascinating.

  • Kelly Wm. Hoskins

    BTW, the reason for the sharp peak for whatever year you're looking at, is because the year is contained in the colophon, and Google is reading that. It would be interesting to see the number of books published for each year… just to look at the shape of it over time.

  • Lucia

    It's almost like Chaos Theory where a word is attracted to its place in history, blooms, and then dissipates. Using a date like 1951 makes the attraction very clear. I'd like to see an example of a term not so anchored in chronology that shows the same phenomenon.
    Enjoyed reading and thinking about this.

  • mym

    Hence all those stupid surveys of The Greatest Actor Ever, or Best Dr Who, etc etc that always return results skewed towards the last few years.

  • You're reading the graph backwards. Mentions of 1930 in 1960 books are LESS frequent than mentions of 1960 in 1990 books. Mentions of 1930 in 1950 books are LESS frequent than mentions of 1950 in 1970 books which are less frequent than mentions of 1970 in 1990 books. And so on. The tails get longer, people are mentioning the past (and dates in general) more and more.

    That can be easily read off the graph, but I looked up the paper. The analysis does not mention how they measure “half life.” Probably they are fitting en exponential to the segment of data following the date. The problem is they are fitting 20 years of data for 1990 books and 80 years of data for 1930 books, and the curve has a fatter tail than an exponential. So, 1930 works out to a longer half life just because they are fitting a longer stretch. The authors should analyze with a sliding window of fixed length, and graphically they should show the traces time-shifted to the same onset year (and perhaps on a log scale, if they are serious about half life as a measure.)


navigateleft Previous Article

Next Article navigateright