This week we saw something nobody would have expected to see a year ago: a Hadoop-free Strata conference. According to Datanami’s Managing Editor Alex Woodie who was on site:
“It was an auspicious absence, to be sure. Making a big yellow elephant essentially vanish in the space of half a year is not an easy feat. But the fact remains that what used to be the rallying point for an entire industry has essentially been reduced to an afterthought. Cloudera, which puts on the show with O’Reilly Media, scarcely even mentioned Hadoop.
I talked to Jeff Carr, SlamData Co-Founder and CEO, who saw first-hand as Hadoop started gaining popularity 10 years ago: “It was the classic example of when you have nothing, anything is better.”
“The idea of large scale, highly scalable, cost-effective storage was unattainable — so it was easy to see why so many companies went down that road,“ said Carr.
The One-Way Parking Lot
If you ask your nearest big data engineer or data scientist, everyone will say that saw it coming. Is this the typical 20/20 hindsight everyone has or was there a clear and obvious problem?
Jeff Carr: “It didn’t take long to realize that Hadoop was really good at storing data it just wasn’t good at doing much else (see Are You Drowning In the Data Lake?). So what’s data without the insight?”
You’ve heard about MapReduce queries on Hadoop that took weeks to run haven’t you? It was reality.
The New Age of Data
For me, one of the most compelling reasons to work in “data” (not saying ‘big data’ anymore) is this: time-to-insight is closing in on ‘now’. People who ran businesses used to wait weeks or months for paper reports to land on their desks — the result of often questionable and undocumented data prep methodologies. That’s not far from multi-week MapReduce jobs. So, looking back, this is kinda the timeline:
- Amass tons of data
- Find a place to put it
- Realize what good is tons of parked data if business analysts can’t easily get to it
- Onslaught of Hadoop/Spark solutions
#5 is now.
We’re post Hadoop. We may even be post Spark. Not really, but we’re definitely not cornered into using Spark. Spark still requires developers.
Pass Hadoop Collect $200
Hadoop is no longer a requirement. Gartner’s 2017 Hype Cycle for Data Management lays it out sans sugar:
“Hadoop distributions are deemed to be obsolete… because the complexity and questionable usefulness of the entire Hadoop stack is causing many organizations to reconsider its role in their information infrastructure.”
So now that we have the “official” declaration from Gartner, let’s talk options.
For storage, there’s a whole lot of NoSQL databases that can store data and scale without much effort: MongoDB, Couchbase, Cassandra…
Even the so-called “NewSQL” solutions like CockroachDB offer compelling alternatives. I mean, there’s literally tons of options.
But to most directly hit the problem on the head — Hadoop’s double-whammy of “Ugh” — you need to wrap your head around a data hub. It’s what I call the big data utility bucket. It’s easy, it’s fast, it’s hardly even technical.
A data hub is a bucket of data. Throw anything you want in there. Connect feeds even. Then connect SlamData. Without doing any prep work, you’ll get to see everything in your hub — just waiting for you to explore and pivot table your way to just the right view — and then slide into home plate with killer, interactive report that you can share anywhere. Most folks call the first part of this awesomeness a data hub.
But beware. If you’re just throwing data into a bucket and then unleashing a bunch of python developers on it then you’ve just exchanged on problem for another.