32

$lookup is new in MongoDB 3.2. It performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing.

To use $lookup, the from collection cannot be sharded.

On the other hand, sharding is a useful horizontal scaling approach.

What's the best practise to use them together?

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Map X
  • 444
  • 1
  • 4
  • 14
  • The $lookup stage require running on the database’s primary shard. As the "from" collection is also not sharded, it's on the same server and the join can be executed locally. You should avoid joins across diffent machines. That's NoSQL ;-) I can imagine that there is no good solution. Do you want to iterate over the result in an application or do you want to store it? In the first case, maybe you have to make find operations in a loop for every document. In the second case, you can use MapReduce: http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/ – jschildgen Jan 06 '16 at 13:46

2 Answers2

42

As the docs you quote indicate, you can't use $lookup on a sharded collection. So the best practice workaround is to perform the lookup yourself in a separate query.

  1. Perform your aggregate query.
  2. Pull the "localField" values from your query results into an array, possibly using Array#map.
  3. Perform a find query against the "from" collection, using a query like {foreignField: {$in: localFieldArray}}
  4. Merge your results into whatever format you need.

Don't let the $lookup limitation stop you from sharding collections that require it for scalability, just perform the lookup function yourself.

JohnnyHK
  • 305,182
  • 66
  • 621
  • 471
  • So, `$lookup` is quite limited at present? – Map X Jan 17 '16 at 04:03
  • 4
    I wouldn't go that far; it's still a powerful tool as the ability to pull in referenced data server-side from unsharded collections still has plenty of utility. But yeah, not being able to pull in data from a sharded collection is a key limitation. – JohnnyHK Jan 17 '16 at 14:55
  • Dude can you give an example? I trying to make inner query from mongoose but i can't since mongo takes it as a string. Check my question at:https://stackoverflow.com/questions/62521211/how-to-pass-inner-query-in-mongodb-from-javascript – mspapant Jun 22 '20 at 18:57
  • @mspapant you could try using mongo.ObjectId() before passing it to mongoose aggregate pipeline. mongo is imported from mongoose.mongo – Abhishek Sharma Aug 16 '20 at 16:41
  • $lookup should be implemented for sharded collections somehow, just like how we do it manually – Dee Feb 25 '21 at 14:47
  • good solution for small collection. However, if the "from" collection is really big, the outcome of the aggregationPipeline, $in: localFieldArray will be huge, and with poor performance. – azulay7 Jun 08 '21 at 07:28
  • This solution does not work when we want to return the cursor to the client, but not the data. I think we should use CQRS in such cases. – xfg Jul 06 '22 at 18:55
  • 2
    Starting in MongoDB 5.1, $lookup works across sharded collections. – nrmad Aug 04 '22 at 14:48
0

As mentioned in MongoDb document "In the $lookup stage, the from collection cannot be sharded. However, the collection on which you run the aggregate() method can be sharded" https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

db.shardedCollection.aggregate([
   { $lookup: { from: "unshardedCollection", ... } }
])

This is the best practise to use them together

Cuong Bui
  • 36
  • 1
  • 4