Why Modern BI Is Failing
Almost daily I hear from companies asking how we can help them connect their relational BI tool (Tableau, PowerBI, ect.) to their NoSQL database including MongoDB, Couchbase, Marklogic and others. It’s understandable, these tools have been around a long time, and many people are comfortable using them. The problem is these tools were not built to handle modern NoSQL data models like JSON or XML, and have extremely poor support for them. Nevertheless, people are determined to make them work. The results in most cases are poor, and ultimately they look for other ways to solve the problem after spending valuable time and money trying to force the square peg in the round hole.
An entire industry has thrived in the last decade based on the simple idea that all data should fit the existing tools, regardless of where it started..
If You Have A Hammer, Everything Looks Like A Nail
The relational data model was first defined in the early 1970’s and it’s been the dominant model for database and analytics for 40+ years. In recent years new more complex data models have taken hold to support modern applications including IoT, Social media and SaaS. Developers have openly embraced this change since it made their job easier and more efficient. However when analytics on these modern applications comes up the immediate response is make ALL the data relational, regardless of how it is now. Its as if we have decided there can only be one data model for analytics for the rest of eternity.
First, you should ask yourself, why am I using a NoSQL DB in the first place? For many the answer is schema flexibility, or the ability to not need to have to design a completely fixed schema upfront. It makes development of the application much easier and more agile. If you value this feature for building your app, you should value it for analytics also. Unfortunately the absolute requirement of ALL traditional BI tools is a fixed schema. From this point forward most of your analytics effort will not be actual analytics at all, but trying to make the flexible schema JSON data fit into a fixed schema model so your tool can understand it. If you don’t need the flexible schema in the data then use a relational DB like Postgres or MySQL, your legacy BI tool will work fine. Otherwise you need to seriously rethink your analytics approach.
Conventional thinking is just “make” the data fit the tool.
How Did We Get Here?
For many non-developers they really don’t fully understand the implications of modern NoSQL datastores. If you’re a business analyst, or marketing person, that did not have a hand in picking the datastore why would you? Most people confronted with the need to gain insights into data will simply default to a tool they know. I recently was trying to explain to an analyst why traditional BI tools were not a great fit for NoSQL, his response was “I’m comfortable with Tableau”. This is the crux of the problem. Modern DB’s have exploded over the last 10 years, they are popping up everywhere, and gaining insights from data models like JSON require tools not built solely for relational data models. Since existing BI were never designed for non-relational data, the focus has been changing the data to fit the tools, instead of just building better tools designed for JSON and other non-relational data.
Need evidence? The number one indicator someone will find a solution like SlamData is have they already failed trying to use an existing BI tool. We have seen companies spend weeks or even months trying to get a “bigger” hammer for that square peg, ultimately only to surrender and realize that powerful analytics on NoSQL data is a very different beast, not just a few degrees from what they already know.
Solving the problem of modern data analytics is as much an exercise in human nature as it is a technical problem. Users learn to like a particular analytics tool and become determined to use them on every possible data problem regardless of the probability it will work.
Don’t Fix Your Data, Fix Your BI Tool
An entire industry has thrived in the last decade based on the simple idea that all data should fit the existing tools, regardless of where it started. A large percentage of “Data prep” is really just making non-relational data like JSON fit into tables so traditional BI tools can work. Sure a small percentage is related to data cleaning, or combining disparate data sources, but majority is not. Studies have shown that 80% of “Data science” is really just data prep, of which most is organizing data to a fixed schema of perfectly flat homogenous tables. This is what your BI tool expects, regardless of the original form of the data. Conventional thinking is just “make” the data fit the tool. Unfortunately this approach creates a number of issues including not just the effort needed to change the data, but the fact that by changing it you actually loose data fidelity. You are “dumbing” down the data.
The correct approach is building better BI tools that actually take full advantage of complex data models like JSON as they were designed. Easy to say, hard to do.
I get asked all the time why nobody else has tried this approach? Simple, it’s really hard to build this kind of solution, which means in most cases it’s just easier to make users change the data.