There has been a fluffy of articles online about how close analysis of the masses of data generated by social media and other digital technologies may allow a means of predicting the future, for example:
[There] is an emerging industry aimed at using the tweetstreams of millions of people to help predict the future in some way: disease outbreaks, financial markets, elections and even revolutions. According to new research released today by Topsy Labs — which runs one of the only real-time search engines that has access to Twitter historical data — watching those streams can provide a window into breaking news events. But can it predict what will happen?
The theory behind all of this Twitter-mining is that the network has become such a large-scale, real-time information delivery system (handling more than a quarter of a billion messages every day, according to CEO Dick Costolo at the recent Web 2.0 conference) that it should be possible to analyze those tweets and find patterns that produce some kind of collective intelligence about a topic.
Wired also has an article about the same idea, but from another company seeking to do this sort of data-crunching and prediction:
The investment arms of the CIA and Google are both backing a company that monitors the web in real time — and says it uses that information to predict the future. The company is called Recorded Future, and it scours tens of thousands of websites, blogs and Twitter accounts to find the relationships between people, organizations, actions and incidents — both present and still-to-come. In a white paper, the company says its temporal analytics engine “goes beyond search” by “looking at the ‘invisible links’ between documents that talk about the same, or related, entities and events.”
The idea is to figure out for each incident who was involved, where it happened and when it might go down. Recorded Future then plots that chatter, showing online “momentum” for any given event.
Which all sounds very science-fiction to me, a cross between Minority Report and Asimov’s Foundation novels. One of the issues with both of these stories is the slight degree of hyperbole on the reporting of them; it seems that rather than predicting the future, it is more about a very close reading of existing data trends and being able to spot them before anyone else does. It’s not about the future, it is about the now.
The other issue I see is what famously former US Secretary of Defence Donald Rumsfeld called, “unknown unknowns“. What I mean by this is that any algorithm applied to the data will only be sensitive to what the creators deem to be important. What is deemed to be important is what we know from the past was important. So to a large degree the warning the system gives of events always has one foot in the past. That is not to say it cannot generate meaningful results, I am sure that is possible, just that it cannot predict the future nor accurately account for trends that have little or no historical precedent.
Hat-tip to Michel for the link. (Also posted on my blog.)