In my previous post, I spelled out many of the reasons why modern databases are poorly served by most of today’s BI tools. In a nutshell, it’s a problem of not having the right tool for the job. The rise of applications like the IoT are driving demand for NoSQL-database support, but the analytics tools used to extract value from that data weren’t built to accommodate non-relational data models like JSON. And as anyone who has worked with both structured and unstructured data will tell you, getting non-relational data into a format compatible with an analytics tool built for strictly relational data is a difficult, repetitive and frustrating task.
Lots of players in the database and analytics ecosystems are scrambling to develop manageable workarounds to force NoSQL data into a format that’s compatible with legacy, relational database analytic solutions, but with decidedly mixed results*. The challenges involved in moving from JSON to a fixed data schema are difficult to overcome and include loss of data fidelity, poor analytic results and slower performance.
An entire industry has thrived in the last decade based on the simple idea that all data should fit the existing tools, regardless of where it started.
Why? Because by their very nature, NoSQL databases do not have a fixed schema. Their value add is their ability to handle data in disparate formats, without interfering with developers’ ability deliver applications quickly and easily. So, if part of the data’s value is its lack of structure, why waste time and dilute the data’s value trying to force it into a format it was never meant to fit?
So a word of advice, the next time you’re talking to a BI vendor about data support, be sure to ask them the following five questions. The vendor’s responses should give you a good indication of truly how “data agnostic” their tool is, and what problems you’re likely to have to overcome if you want to use it with your NoSQL data.
Ask these questions. Twice if you have to.
Does your BI tool require me to declare a fixed data schema in order to work?
If you are required to declare a fixed schema, be prepared to spend the bulk of your time mapping fields to pull the right data, then remapping them every time the schema changes (which could be very often). Also, many queries simply won’t work since the data loses fidelity moving from an unfixed to a fixed schema.
Do I need to extract and duplicate the data to a new location to perform analytics using your tool?
If you need to extract and duplicate your data, expect a big performance hit, and expect to address new compliance issues since the data now resides in a new location (including the cloud). And let’s not get started on the added NRE and maintenance expenses associated with maintaining a separate data lake just for analytics (regardless of whether it’s done on premises or in the cloud).
Does your BI tool require me to set up additional data infrastructure like Redshift or BigQuery?
If you need to set up new infrastructure, ask yourself if you have the in-house resources to support them. If not, better figure in the additional costs and manpower associated with setting up an infrastructure to add these capabilities to your team. And be prepared to fight with two vendors’ support teams should there be any conflicts between the analytics solution and the data infrastructure in your application. It’s not uncommon for one vendor to fault the other and say that a fix is the others’ responsibility, leaving the you at the mercy of the development community to figure out a fix.
Does the tool require me to use third-party data prep, mapping software or driver to work with my data?
If you need to use 3rd party tools, that’s yet another cost and resource requirement the IT team must accommodate.
If I chose a flexible schema database for my application, why does my analytics tool not support this same approach?
Ultimately, this is the most important question to ask your BI vendor as it cuts to the heart of the matter: if my data is unstructured, why do I have waste the time and expense trying to use a BI tool that doesn’t support unstructured data? Speaking on behalf of the team that developed the industry’s first BI solution with native support for unstructured data, I’m happy to tell you that the answer to that question is “With SlamData, you don’t have to!”
* I invite you to read my co-founder John’s detailed story about one company’s attempt to bridge the SQL-NoSQL gap to get a better understanding of why the workaround approach doesn’t make sense.