The ETL (extract, transform, load) industry has been around for decades. Its primary purpose is to move data from source locations to data warehouses so analytics and data science teams can access the data in one place and perform analysis across a range of critical data sources.
More recently, the rise of low cost cloud object storage like AWS S3, Azure Blob storage and others has morphed this process into ELT (Extract, load, transform). This process pushes the data transformations further down the pipeline, which somewhat streamlines the problems and also lowers overall costs.
ETL/ELT tools have flourished in the last decade as the volume and variety of data sources enterprises need to handle has exploded, but there’s one obvious gap in the solution space — complex JSON data.
Coincidentally, complex JSON data is also one of the most popular and rapidly growing kinds of data we see in the market. Virtually ALL ETL/ELT vendors claim to support JSON, but the reality is far different. Unlike traditional relational or tabular data, JSON is not a one-size-fits-all data model — it can range from very simple to unbelievably complex depending on the whims of the developer building the applications.
When an ETL/ELT vendor says they support JSON, what they mean is that they support VERY simple flat JSON. As soon as the data gets too complex, they falter, reverting back to their old familiar approach: start writing code. Some vendors don’t even try to avoid code — they actually build a coding engine in their platform to handle complex JSON.
The harsh reality is that complex JSON is the last ETL/ELT problem to be solved without expecting users to write code which, by definition, means that the only access to this data is through a data integration engineer. Business analysts, most data scientists, and other non-engineers are locked out.
Ask around. We’ve spoken to many vendors in the ETL/ELT space, and all of them said that JSON is no problem. But as soon as we handed them a couple of slightly complex JSON data sets (multiple levels of nesting, variable types), they quickly retreated to the familiar refrain, “For this we would need to write Python (or Scala or some other code).” Tools like AWS Glue attempt to solve the problem by auto-generating Python code, but that just means your engineers have to de-bug a bunch of non-functional code. Don’t believe us? Give it a try.
JSON data is some of the most common data created today. Virtually all SaaS applications, Mobile applications, IoT, and many others use JSON as their default data model. JSON data is unlike traditional relational data in many ways, including non-fixed variable schema, variable data types, and the ability to have “nested” data structures. This presents a major hurdle when companies need to access this data for analytic purposes.
Analytic tools expect the data to be in a fixed tabular form — essentially a spreadsheet — of rows and columns. For this to be the case, the data needs to be transformed from the JSON model to the tabular model.
As previously noted, this transformation can take place as ETL or the updated version, ELT. Either version can be very complex. Traditional ETL/ELT solutions like Informatica, Talend, Alooma, or FiveTran cannot handle complex JSON. They all claim they can handle JSON data, but what they really mean is they can do some very simple things, beyond which they require engineers to write complicated Python or Scala code to solve the rest.
Most companies don’t even bother trying to use ETL/ELT software and simply use expensive data integration engineers write custom code to transform the data. This approach is slow, complicated, and completely inaccessible to anyone who isn’t an engineer.
Enter SlamData REFORM. Our solution lets ANY user visually prepare virtual analytic ready tables directly on the JSON, regardless of complexity. There’s no coding and no waiting on data integration engineers. Users can curate custom data sets in minutes and then iterate over them at any time as their data needs change. These tables can be streamed into most popular data warehouses, including Redshift, Snowflake and Google Big Query, or to any other location you choose.
Almost any company, large or small, can have JSON data generated by SaaS and mobile apps or from web API’s, but companies most likely dealing with these kinds of issues include:
Currently, SlamData REFORM is provided as a docker image, and is also available in the AWS marketplace. Since it is a docker, it can run in any client infrastructure, be it on-prem, public or private cloud. Since SlamData is not a SaaS solution, we don’t hold any user data — we also don’t carry compliance or regulatory burdens like GDPR. Users can install the solution within their existing infrastructure and use it as they need securely.
REFORM supports JSON data stored in AWS S3, Azure Blob Storage, Wasabi, and MongoDB — all very popular places to store JSON data. However, we can add a new connector to virtually ANY new data source easily within a few weeks using our advanced Lightweight Connector Technology (LWC).
SlamData REFORM is the world’s first solution designed from the ground up to transform any JSON data using a simple and powerful data browser approach. Our advanced mathematical foundation allows us to understand and shape semi-structured data in ways existing tools cannot. Our solution fills the gap in existing ETL/ELT and Data prep tools the market has been demanding.