Some ten years ago, I started writing about what we then called "big data" for ZDNet; in fact, I was the first person at ZDNet focused on it exclusively. Coming from a consulting background in enterprise business intelligence and application development, I thought it would be fun to cover this burgeoning new area of the analytics game that I had been part of since the late 1990s. The editors wanted to name this blog simply "Big Data." My thought was that that term wouldn't age so well...that whatever was "big" then would seem "regular" in ten years' time. I suggested a slight variation: "Big on Data" (because I was, and I am). And that's how the blog and its name came about.
Also read: Big Data: Defining its definition[1]
I was a bit amused that so many people saw big data as shiny and new. It wasn't...instead, it was a logical progression of the enterprise BI technology that had existed for the period of about 20 years prior. There were some important differences, though. Instead of being based on expensive commercial software, the tech of the day -- Apache Hadoop[2] -- was open source. Instead of leveraging proprietary data warehouse hardware appliances, using (limited) enterprise storage, Hadoop used commodity servers, and their inexpensive direct-attached storage (DAS) commodity disk drives. And rather than struggling at terabyte scale, Hadoop bragged it could work at petabyte scale -- handling data volumes three orders of magnitude bigger.
Also read: MapReduce and MPP: Two sides of the Big Data coin?[3]
Lots of warts
There were downsides, too. And lots of them. Hadoop didn't work with SQL, but rather required engineers working with it to write imperative MapReduce code -- in Java -- to get their work done. It