Big Data Janitors


New York Times: For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights

With a move to a new house and a new class being developed, I’m a bit behind in posting to this blog.  However, I wanted to make sure I eventually posted a link to this article from over the summer in the New York Times.  It provides a nice picture of what the art of Data Science looks like in 2014.  For many data scientists, the majority of time on a given project is often spent obtaining, cleaning, organizing, and otherwise wrangling data into a format that you can begin to use.  Trial and error, time consuming processes, and error-prone manual routines are commonplace before you ever get to work on the “fun stuff.”

Luckily, a lot of smart people are looking at ways to make the process faster, more accurate, and less painful.  Progress will be slow, but hopefully a stready stream of innovation will allow these high-skilled “data janitors” to focus on the high value analysis work we need to push forward in fields as diverse as energy and healthcare.

