The Definitive Guide To JOINs On MongoDB
The Obvious: NoSQL has been around a while. Same with MongoDB. If you’re not using it now, you’re probably going to bump into it soon.
The Reality: Developers pushed a lot of businesses down the path of using MongoDB because the benefits were innumerable. And there’s no going back.
The Ghastly! Truth: The “business” side of the house kinda got the short end of the stick. In plain English: they got screwed. In “Relational Land” they — analysts, BI folks — enjoyed many a robust toolset/software solution for analytics. They could do robust analysis on loads of data in a single click — all without intervention from “IT”. In “Non-relational Land”? No such luck. Basically “switching over to MongoDB” meant starting building BI from scratch.
The Denouement: This guide lays out the facts and the options for querying MongoDB. It’s bleak up front but gets way better as you think outside the box. Stick with it because it’s awesome.
The ABCs of JOINs
We’re not gonna bore you, but let’s lay out the basics. JOINs are at the center of data analysis. They’re the workhorse of analytics. And there are only a few types of them.
They are: Cross, Inner and Outer. Outer comes in a few forms: Left, Right and Full.
When made available to a pro analyst, JOINs are like… it’s like combining a scalpel and a microscope into one tool. JOINs simply make it easy to get the right subset of data in front of your eyes.
Out of the Box: JOINs On MongoDB
You might be excited to know that you can do all JOIN types on MongoDB. YES!
But practically speaking only LEFT OUTER JOINs work. NO!
Option A: Oh Yes! Wait! What the…
MongoDB built pretty good infrastructure for doing native queries — it’s called the Aggregation Framework. But due to certain limitations of that framework you can only do a LEFT OUTER JOIN.
Is that a half-finished job? That’s the nice way of putting it. But it’s more a limitation of the underlying technology than unfinished word. The need is certainly there.
Regardless, you likely need more than LEFT OUTER JOIN. If by some odd fluke, that’s all you need, then you’re good to go. But don’t kick your heels too fast: you’ll still need to learn, know, wield Mongo Query Language. You’ll need to dedicate a resource to learning it and keeping up with it. That’s a lot of overhead. All for LEFT OUTER JOIN…
Option B: MapReduce Hell
No reason to sugar coat anything here, right? We’re all adults. If you need or want the full gamut of queries — you do — then you’re going to have to go down this path. Bring water, extra food, and a flashlight. Or more literally take out the checkbook. It’s gonna hurt. Here’s the brutal truth:
1. Brush up on your MapReduce. It’s fundamental to working with queries on MongoDB. No one really likes MapReduce for a lot of reasons. But it’s all you got so crack the books.
3. Dedicate someone to learning Mongo Query Language.
4. Start keeping tabs on your schema changes. Why? Well, if you do pull off some good query writing with the three technologies listed above then you’re going to get just one minute of fame and fortune because once your schema changes everything will fall apart. Your queries will break. Your reports will break. That equals more development time, more upkeep, more time, more money. That’s a hamster wheel.
The sum total here is that you invested in great technology, and if you stick with the status quo for analytics you’re going to pay twice. But it doesn’t have to be that way!
JOINs On MongoDB When You’re Running SlamData
SlamData’s mission is to tame the polyglot world (databases gone wild!). How? By offering the world a lingua franca. SlamData is a platform that allows you to simply get to work on data wherever it is without doing anything special. That sounds casual but it was actually the output of a large development team working for a few years.
Here’s how it works with MongoDB:
Have Your Cake and Eat It Too!
- Fire up SlamData
- Connect it to MongoDB
- Write a query using SQL
- Run the query*
- Enjoy (share) the results
*The special sauce is this: SlamData translates the query into a highly optimized native query and then sends it down to MongoDB via the Aggregation Framework or MapReduce. It’s smart enough to figure out the best route on the fly. It’s 100% automated.
Did You Notice Anything Different?
Oh yeah you did.
- Did you touch Mongo Query Language?
- Did you think about MapReduce?
- Did you worry yourself at all about schema this or schema that?
No, you didn’t!
When you use SlamData as the analytics layer on MongoDB what you get is radical simplicity and bulletproof queries that work all the time.
The bottom line is this: get out of the plumbing business. Or the janitor business. Or both. Stop maintaining one-off, homegrown apps, stop prepping data, stop cleaning up data, and most importantly stop waiting for insight.
Bottlenecks or Robust Analytics On Live Data?
The significance of this… blockage… (that’s really the right word) is worth a visual metaphor. Let’s have some fun… It’s like the difference between a tricycle and a jet.
The tricycle actually can get you wherever you want to go, right? Say, from Denver to Boulder, CO. Go ahead and use it! It’s cheap. It’s practical. You’ll get there in a few days or weeks. Maybe longer. You’ll be a wreck by the time you arrive but you’ll arrive.
But, wait, there’s a jet fighter on the tarmac right next to the trike. It’s fueled up and ready to go. You’ll get to Boulder in 2 minutes!
Butter knife vs. Swiss Army Knife? Dust pan vs. Hoover? It’s hard to fully capture it visually so let me break it down one last time: based on a few premises, the final arithmetic is simple.
1. You (your business) have finite resources.
2. Analysis of data is critical.
3. You’re using MongoDB.
4. Prep work is a cost — a big cost (see For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights).
The Final Calculus
- If you could give up the overhead of learning a bunch of specialty technologies…
- If you could instead continue to rely on SQL, the ‘tried and true’ workhorse of analytics…
- If you could never worry about schema changes again…
- If you could get developers back to working on development projects instead of pitching in on ad hoc biz-related queries or maintaining a custom application…
- And if you could get insight today instead of next week…
Would you do it?
Actually, the best question is: “Why wouldn’t you?”
It’s Actually Going To Get Even Better
When SlamData releases its next version — later this year — you’ll see radical performance improvements. SlamData engineers have been hard at work building the infrastructure to make MongoDB analytics (NoSQL analytics!) as fast and as easy as what the world was used when it was just RDBMS.
Not only speed, though!
In addition, you’re going to get the ability to do JOINs across collection across a number of different NoSQL DBs. If you’re reading between the lines then you know we’ve figured out how to transcend the limitations of the native query APIs for the different NoSQL data sources. In other words, awesome JOIN power regardless of whether the native APIs support them.
That’s a game-changer.
This is an exciting time for NoSQL analytics. Actually, for analytics in general.
In fact, SlamData is starting to dismantle the line between SQL and NoSQL… because having one tool for all of your data — wherever it is — will change the way you work. It will change the way you think about data. You’ll no longer care about what kind of db you’re using — you’ll always go for the best db that fits the data — because you know you’ll get your analytics just the way you like ’em.