9

With the new aggregation pipeline stage $lookup we are now able to perform 'left outer joins'.

At first glance, I want to immediately replace one of our denormalised collections with two separate collections and use the $lookup to join them upon querying. This will solve the problem of having, when necessary, to update a huge number of documents. Now we can update just one document.

But surely this is too good to be true? This is a NoSQL, document database after all!

MongoDB's CTO also highlights his concerns:

We’re still concerned that $lookup can be misused to treat MongoDB like a relational database. But instead of limiting its availability, we’re going to help developers know when its use is appropriate, and when it’s an anti-pattern. In the coming months, we will go beyond the existing documentation to provide clear, strong guidance in this area.

What are the limitations of $lookup? Can I use them in real-time, operational querying of our data or should they be left for reporting, offline situations?

Dave New
  • 38,496
  • 59
  • 215
  • 394

2 Answers2

5

I share your same enthusiasm for $lookup.

I think there are trade-offs. One of the major concerns of SQL databases (and which is one of the reasons for the genesis of NoSQL) is that at large scale, joins can take a lot of time (well, relatively speaking).

It definitely helps in giving you a declarative model for your data, but then if you start to model your entire NoSQL database as though its a database of rows and tables (just using refs, for example), then you begin modeling it as though it's simply a SQL database (to a degree). Even MongoDB mentioned it (like you put in your question):

We’re still concerned that $lookup can be misused to treat MongoDB like a relational database.

You mentioned:

This will solve the problem of having, when necessary, to update a huge number of documents. Now we can update just one document.

I'm not sure what your collections look like exactly, but that definitely sounds like it could be a good use for $lookup.

Can I use them in real-time, operational querying

I would say, again, it depends on your use-case. You'll have to compare:

  • Desired semantics of your queries (declarative vs imperative)
  • Whether modeling your data as more relational (and thus using $lookup) in certain circumstances is worth the potential trade-off in computational time (that's assuming that querying across collections is even something to be concerned about, computationally speaking)

etc...

I'm sure in the coming months we'll see perf tests of the "left outer joins" and perhaps MongoDB will start writing some posts about when $lookup is an antipattern.

Hope this answer helps add to the discussion.

Josh Beam
  • 19,292
  • 3
  • 45
  • 68
5

First of all MongoDB is a document-based database and will always be. So the $lookup aggregation pipeline stage new in version 3.2 didn't change MongoDB to relational database (RDBMS) as MongoDB's CTO mentioned:

We’re still concerned that $lookup can be misused to treat MongoDB like a relational database.

The first limitation of $lookup as mentioned in the documentation is that it:

Performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing.

Which means that you can't use it with a sharded collection.

Also the $lookup operator doesn't work directly with an array as mentioned in post therefore you will need a preliminary $unwind stage to denormalize the localField if it is an array.

Now you said:

This will solve the problem of having, when necessary, to update a huge number of documents.

This is a good idea if your data are updated often than they are read. as mentioned in 6 Rules of Thumb for MongoDB Schema Design: Part 3 especially if you have a large hierarchical data sets.

Denormalizing one or more fields makes sense if those fields are read much more often than they are updated.

I believe that with careful schema design you probably will not need the $lookup operator.

Community
  • 1
  • 1
styvane
  • 59,869
  • 19
  • 150
  • 156
  • sharding is possible in 5.1 with $lookup :) also I wanna point out there are many cases where you will have operations which require both bulk inserts and regular updates to the documents in which case embedding becomes a meme. – nrmad Aug 09 '22 at 11:48
  • @nrmad, you can edit the answer to add this info. I'm no longer following what is happening in the MongoDB world. – styvane Aug 13 '22 at 05:59