The Virtual Data Warehouse

A guide to modern data access and analytics

In the Beginning: A Wholesome Need

The corporate data warehouse continues to evolve from its beginnings in the late 1970s. The idea was simple: combine all your various corporate data silos (accounting, logistics, inventory, sales, etc.) into a single instance that is optimized for reporting and analytics. This gave companies the opportunity to get a 360 degree view of their business.

data warehouse diagram

Data Warehouse 1.0: The (Crushing) Reality

Data warehousing 1.0 consisted of physically transforming and moving the data via complex ETL (Extract, Transform, Load) processes that would move the data from multiple source systems into the target data warehouse and create a schema that allowed for efficient querying and reporting via various BI tools. Various schema methodologies were created like Star Schema and Snowflake Schema that attempted to maintain flexibility. Creating the ETL scripts and processes and defining the schema’s was often a very long and complex process. Large scale data warehouse projects could easily take more than a year to complete and cost many millions of dollars. Once complete, these projects often lacked flexibility to address new analytic requirements, changing data, ad-hoc queries — or even business models.

Minimum Annual HR Cost for Maintaining A Basic Data Warehouse = $500,000
(
http://share.slamdata.com/cost-of-data-warehouse)

The Data Dumpster, Consultants and Data Integration Engineers (or Data Warehouse 2.0)


Beginning around 8-10 years ago we saw the emergence of Data Warehousing 2.0. This approach leveraged the rise of cost effective cloud storage, especially object oriented storage like Amazon S3 and Azure Storage.



Object Growth Heading Into the Trillions

(http://share.slamdata.com/s3-growth)

These storage options allow easy ingest of large amounts of data efficiently, and do not require a common fixed schema, so data from multiple disparate sources can easily be added. Once the data is stored in these common buckets, new experts — so-called “data integration engineers” — emerged to help create fixed-schema, “curated” data models in order for end users to use popular BI tools to query and analyze the data.

The 2.0 approach drove down the cost curve on one side (technology) but drove it up at the same time (a new class of engineers). The model is really just a tweak on the old model: now it’s ELT (Extract, Load, Transform). The transformation still has to happen. In the end,  Data Warehouse 2.0 is a step forward, but not the ultimate answer.

Data Warehouse 3.0: It’s Gone Virtual!

A Virtual Data Warehouse is next. In this model we continue to take advantage of inexpensive and flexible cloud storage (S3, Azure Storage, ect) but no longer do we need to adhere to the ELT model of carefully building out curated data models in the hope they meet the needs of analysts. Virtual Data Warehousing lets anyone curate on-demand “datamarts” directly from cloud storage or any other data source and connect this to their favorite analytics tool easily. Complex semi-structured data, relational tables, you name it — whatever they have is easily accessed and made ready for use. In the virtual model, you don’t need ETL or ELT — data never moves from the initial cloud storage. In the virtual model users create virtual “views” of the data that are specifically suited for analytics. In essence, you have a virtual data mart that can be shared, published, and modified (given correct permissions) by anyone to be exactly what they need. Nothing more.

The SlamData Virtual Data Warehouse architecture

Real Self-Service

What’s the advantage of this approach?  There are many, but first and foremost it’s about empowering end users and analysts so they don’t need to rely on data integration engineers or rigid ETL/ELT models. No more relying on curated predefined data models.

It’s Agile

Second, it’s about agility. With this model companies can easily combine lots of data into cloud stores like S3 and make it immediately available for use by end users and analysts, bypassing the complex ETL/ELT process, no extensive data relocation, and no tedious data prep.

It’s Much Cheaper

Finally, it’s about lowering costs. Leveraging inexpensive cloud storage, cutting out expensive middlemen and creating curated data models lowers costs. And virtual data warehouses move less data, so data and transport costs are lowered.

What Does A Company Look Like That Uses A Virtual Data Warehouse?

More and more companies are adopting large scale data consolidation strategies. Whether it is an in-house Hadoop data lake, or a cloud storage service (S3, Azure Storage, etc) companies understand it’s a great first step in consolidating data silos for later consumption. Once a company decides on an approach (in–house vs. cloud) they then need to deal with issues like data governance and security for this new “uber” store of data. Once this is complete, the most important task of all is “How do we make all this data readily available to our teams?”

In each previous iteration of the DW, they ultimately relied on “curated” data models that were custom built and made assumptions about which data analysts might need and what questions they may ask. As with any predefined model, it breaks when a user needs something new. A new data element added or a different schema defined for a certain analytical problem. DW 1 and 2 simply can’t be flexible or adaptive enough to meet every need. Enter the “Virtual Data Warehouse”. Now end users can access the raw data, regardless of structure or complexity, then intelligently “curate” just the data they need, how they need it, on the fly. A virtual data mart in minutes. And it can be changed by the same user that created it easily and quickly. End users no longer need to rely on predefined views or curated data models — they can create them on demand and then use it with their favorite BI or Data Science tool immediately.

Summary

So why should you care? Previous versions of data warehousing were defined by cost and complexity. They also made end user access to raw data impossible. By contrast, Virtual Data Warehouse is about empowering the end user, agility, and driving the cost curve down. We live in a data driven world, so access to data should be agile, efficient and flexible. This is what VDW is all about.

Send Us A Message

4 + 14 =

Get In Touch

1919 14th St Suite 700
Boulder, CO 80302

© 2018 SlamData, Inc.