Big Data Hubris?
A recent article in the magazine SCIENCE presents an important warning to those who see Big Data as the solution to all problems. The article (“The Parable of Google Flu: Traps in Big Data Analysis” by Lazer, Kennedy, King, and Vespignani) tells of serious flaws in what has been one of the canonical examples of Big Data at work. The article was also featured in a recent blog post at the NY Times.
The authors make a point—which I agree with strongly—about the danger of believing that simply having access to a lot of data makes the use of rigorous methods unimportant:
“Big data hubris” is the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.
I think few would argue with the above statement. Yet at the same time, there is another point of view that is just as off base; a perspective held by those that I’ll call “big data deniers.” Ignoring the enormous volumes of data we generate and record can be just as dangerous as mis-using it. The authors of this article have it right: so-called big data methods are a supplmenet—and a potentially valuable one—to the rigorous and traditional methods of collection and analysis developed over generations of scientific practice.