Big Data Janitors
With a move to a new house and a new class being developed, I’m a bit behind in posting to this blog. However, I wanted to make sure I eventually posted a link to this article from over the summer in the New York Times. It provides a nice picture of what the art of Data Science looks like in 2014. For many data scientists, the majority of time on a given project is often spent obtaining, cleaning, organizing, and otherwise wrangling data into a format that you can begin to use. Trial and error, time consuming processes, and error-prone manual routines are commonplace before you ever get to work on the “fun stuff.”
Luckily, a lot of smart people are looking at ways to make the process faster, more accurate, and less painful. Progress will be slow, but hopefully a stready stream of innovation will allow these high-skilled “data janitors” to focus on the high value analysis work we need to push forward in fields as diverse as energy and healthcare.