This Bank Holiday weekend’s Question To Which The Answer Is No And Which Successfully Winds Up Alix Mortimer (#QTWTAINAWSWUAM – it’ll never catch on, though it clearly should) is this one from the Beeb’s Rory Cellan-Jones:
Can computers replace historians?
here is the biggest claim so far – crunching through the big data of history can help us spot patterns and work out where the world is heading next.
That is what Kalev Leetaru, a data scientist at Washington’s Georgetown University, believes may be possible. Using a tool called Google Big Query, designed for interrogating vast collections of data, he has been sifting through a database of events stretching back to 1979.
This is GDELT, which has collected media reports of events from innumerable sources in more than 100 languages for 35 years. “What we did here,” Leetaru explains, “was use this tool to shove in a quarter of a billion records and use this massive piece of software to just in a few minutes sift out the patterns in this data.”
What he says he found was complex patterns of events repeating themselves over the years. He has looked at recent events in Egypt , in Ukraine, in Lebanon and tried to draw common patterns.
The answer of course is No, and in fact nobody is seriously suggesting otherwise, not even the data scientist in the story.
Leetaru says historians should see this kind of computational tool as just another technique amongst many rather than a threat to their professional expertise. In any case, they may look at the patchy record of big data in areas like election forecasting and flu trends and decide their days sifting through dusty archives are not numbered after all.
For all the tail-end humility, it is worth rehearsing the reasons why this idea is being oversold, and they go beyond the fact that Google Big Query sounds like something you’d use to report graffiti in your neighbourhood or check the local bye-laws on squirrel-feeding.
The tritest first – is the purpose of “doing history” solely to work out where the world is going next? For policy-focussed think thanks maybe, for historians probably not. This is the bread and butter of undergraduate historiography seminars, and it’s not difficult to come up with reasons other than than to “do history”. Because it tells you something about the human condition which has nothing to do with mere events, because it widens your understanding of your own culture and biases and those of other people, because it challenges your preconceptions about tradition and heritage, or enhances them, or perhaps just because you have that certain bloody cast of mind that delights in intellectual problems which cannot be reduced easily to numerical values and positively require human intervention to make sense of them, and because you believe that the training and sharpening of such minds is of value to the future of the race. All these things.
Second objection, it indexes and detects patterns in media reports. Not in the Raw Stuff of Time Itself. A more perfectly designed tool to assess changes in pattens of media reportage over the last 35 years would be hard to conceive, but whether it can be said to be crunching up actual history in its neat teeth is something else. There’s a whole extra layer of analysis to slot in here about the nature of historical data and how we create it. There is no such thing as “just data”, a fact which most of the internet found itself having to explain to Chris Anderson in 2008 when he wrote a piece in Wired called “The End of Theory.” This is philosophy of science 101 (it’s archaeology 101 too). “Data” in anything other than pure numerical terms is conceived of through human intervention, through choices about what to foreground and what to omit, through the murky veil of language itself. Somebody has to put this stuff in to this difference engine, and however you do is going to shape your outputs. GIGO &c.
There is at least a certain audacity in making the source of all your inputs The Meedja, an audacity that we can only hope the data scientist in question is aware of (although US print media is famously more staid than British print media – to fully grasp how eye-popping this exercise looks from a British perspective, USian readers should imagine the inputs were TV news segments). On the other hand, at least media coverage of current events can be said to be broadly afflicted with the same problems and biases down the last 35 years or so. At least it’s consistently wrong, right? I mean, obviously after you control for differences between individual journalists and their many biases and bugbears, between editorial approaches at different media outlets, the whole meta-history of the media scene and of reportage and its changing norms and standards, particularly over the period which sees the arrival of the internet, and, er… Well, there are some issues with your input source, in other words, and being able to identify problems with your sources is not the same as being able to control for those problems, as any historian can tell you.
The third objection is the killer from an archaeologist’s point of view and it is a corollary of the second – the data involved currently goes back (like the Head of the People’s Republic) to 1979, which while it is naturally a great deal in fag-and-wisdom years is a blink of an eye in human history itself – even recorded human history, which makes the pattern-detection thing a bit redundant. What are you going to do when your newspaper reports run out? What are you going to do when basic assumptions about, I dunno, states, war, international law, human political relations, are so morphed by the passage of time as to be unrecognisable? What are you going to do, in short, when modernity runs out? What are you going to do when writing itself runs out? What are you going to do – and any data scientist should perforce be interested in answering this question – about the big, the seriously fucking big, patterns in human history when your data inputs are so patchy and variable?
Archaeologists struggle with this all the time, and it’s one of the reasons why prehistory is the best kind. You cannot seek trite, proximate causes, because you simply don’t know that this set of people invaded that, or this set of people started speaking that language because of some particular set of political pressures, or this set of people moved there because of a series of famines. We do not have the data. All you can do is try and detect much more abstract patterns in the distribution of material and try and make it say something about human endeavour (or, as prehistorians of my acquaintance colloquially put it, make it up). Historians of recorded periods are less naturally stretched in theoretical terms, but even they are dealing with much, much patchier and more variable data inputs than newspaper reportage.
For all that, this does sound like an insanely useful tool for certain purposes, and I wonder if Leetaru simply needs to scale back his terms a bit. It doesn’t seem logical in any sense that computers could replace historians. What they could well replace, it seems to me, is think tanks.