148

What's the syntax for doing a $lookup on a field that is an array of ObjectIds rather than just a single ObjectId?

Example Order Document:

{
  _id: ObjectId("..."),
  products: [
    ObjectId("..<Car ObjectId>.."),
    ObjectId("..<Bike ObjectId>..")
  ]
}

Not Working Query:

db.orders.aggregate([
    {
       $lookup:
         {
           from: "products",
           localField: "products",
           foreignField: "_id",
           as: "productObjects"
         }
    }
])

Desired Result

{
  _id: ObjectId("..."),
  products: [
    ObjectId("..<Car ObjectId>.."),
    ObjectId("..<Bike ObjectId>..")
  ],
  productObjects: [
    {<Car Object>},
    {<Bike Object>}
  ],
}
Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
Jason Lin
  • 1,957
  • 3
  • 16
  • 18

8 Answers8

185

2017 update

$lookup can now directly use an array as the local field. $unwind is no longer needed.

Old answer

The $lookup aggregation pipeline stage will not work directly with an array. The main intent of the design is for a "left join" as a "one to many" type of join ( or really a "lookup" ) on the possible related data. But the value is intended to be singular and not an array.

Therefore you must "de-normalise" the content first prior to performing the $lookup operation in order for this to work. And that means using $unwind:

db.orders.aggregate([
    // Unwind the source
    { "$unwind": "$products" },
    // Do the lookup matching
    { "$lookup": {
       "from": "products",
       "localField": "products",
       "foreignField": "_id",
       "as": "productObjects"
    }},
    // Unwind the result arrays ( likely one or none )
    { "$unwind": "$productObjects" },
    // Group back to arrays
    { "$group": {
        "_id": "$_id",
        "products": { "$push": "$products" },
        "productObjects": { "$push": "$productObjects" }
    }}
])

After $lookup matches each array member the result is an array itself, so you $unwind again and $group to $push new arrays for the final result.

Note that any "left join" matches that are not found will create an empty array for the "productObjects" on the given product and thus negate the document for the "product" element when the second $unwind is called.

Though a direct application to an array would be nice, it's just how this currently works by matching a singular value to a possible many.

As $lookup is basically very new, it currently works as would be familiar to those who are familiar with mongoose as a "poor mans version" of the .populate() method offered there. The difference being that $lookup offers "server side" processing of the "join" as opposed to on the client and that some of the "maturity" in $lookup is currently lacking from what .populate() offers ( such as interpolating the lookup directly on an array ).

This is actually an assigned issue for improvement SERVER-22881, so with some luck this would hit the next release or one soon after.

As a design principle, your current structure is neither good or bad, but just subject to overheads when creating any "join". As such, the basic standing principle of MongoDB in inception applies, where if you "can" live with the data "pre-joined" in the one collection, then it is best to do so.

The one other thing that can be said of $lookup as a general principle, is that the intent of the "join" here is to work the other way around than shown here. So rather than keeping the "related ids" of the other documents within the "parent" document, the general principle that works best is where the "related documents" contain a reference to the "parent".

So $lookup can be said to "work best" with a "relation design" that is the reverse of how something like mongoose .populate() performs it's client side joins. By idendifying the "one" within each "many" instead, then you just pull in the related items without needing to $unwind the array first.

Benji
  • 310
  • 3
  • 12
Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
  • Thank you it works! Is this an indicator that my data isn't structured / normalized properly? – Jason Lin Jan 23 '16 at 20:33
  • 1
    @JasonLin Not as straigtforward as "good/bad", so there is a bit more explaination added to the answer. It depends on what suits you. – Blakes Seven Jan 23 '16 at 20:48
  • 2
    the current implementation is somewhat unintentional. it makes sense to look up all values in an array of local field, it does not make sense to use the array literally so SERVER-22881 will track fixing that. – Asya Kamsky Mar 16 '16 at 22:05
  • @AsyaKamsky That makes sense. I've generally been treating enquires re `$lookup` and Document validation as being features in their infancy and likely to improve. So direct expansion on an array would be welcomed, as would a "query" to filter results. Both of those would be a lot more aligned with the mongoose `.populate()` process that many are used to. Adding the issue link directly into the answer content. – Blakes Seven Mar 16 '16 at 22:29
  • Thanks a ton mate. Though, I still feel doing joins is pretty hard when in comes to MongoDB in comparison to relational DB solutions. But, yet again MongoDB is not meant for such operations. – zubair1024 Jun 12 '17 at 19:07
  • 2
    Note that as per the answer below this one, this has now been implemented and `$lookup` now works directly on an array. – Adam Reis Feb 09 '19 at 03:43
  • How can it be implemented without unwind??? – user3494434 Nov 11 '22 at 07:21
55

Starting with MongoDB v3.4 (released in 2016), the $lookup aggregation pipeline stage can also work directly with an array. There is no need for $unwind any more.

This was tracked in SERVER-22881.

Benji
  • 310
  • 3
  • 12
joseaio
  • 767
  • 7
  • 5
34

You can also use the pipeline stage to perform checks on a sub-docunment array

Here's the example using python (sorry I'm snake people).

db.products.aggregate([
  { '$lookup': {
      'from': 'products',
      'let': { 'pid': '$products' },
      'pipeline': [
        { '$match': { '$expr': { '$in': ['$_id', '$$pid'] } } }
        // Add additional stages here 
      ],
      'as':'productObjects'
  }
])

The catch here is to match all objects in the ObjectId array (foreign _id that is in local field/prop products).

You can also clean up or project the foreign records with additional stages, as indicated by the comment above.

grigson
  • 3,458
  • 29
  • 20
extraymond
  • 626
  • 9
  • 12
  • Documentation on let and pipeline arguments of lookup: https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#join-conditions-and-subqueries-on-a-joined-collection – Stefan Nov 04 '21 at 10:03
  • Why it doesn't work without $expr? – madarinho Oct 19 '22 at 17:51
6

use $unwind you will get the first object instead of array of objects

query:

db.getCollection('vehicles').aggregate([
  {
    $match: {
      status: "AVAILABLE",
      vehicleTypeId: {
        $in: Array.from(newSet(d.vehicleTypeIds))
      }
    }
  },
  {
    $lookup: {
      from: "servicelocations",
      localField: "locationId",
      foreignField: "serviceLocationId",
      as: "locations"
    }
  },
  {
    $unwind: "$locations"
  }
]);

result:

{
    "_id" : ObjectId("59c3983a647101ec58ddcf90"),
    "vehicleId" : "45680",
    "regionId" : 1.0,
    "vehicleTypeId" : "10TONBOX",
    "locationId" : "100",
    "description" : "Isuzu/2003-10 Ton/Box",
    "deviceId" : "",
    "earliestStart" : 36000.0,
    "latestArrival" : 54000.0,
    "status" : "AVAILABLE",
    "accountId" : 1.0,
    "locations" : {
        "_id" : ObjectId("59c3afeab7799c90ebb3291f"),
        "serviceLocationId" : "100",
        "regionId" : 1.0,
        "zoneId" : "DXBZONE1",
        "description" : "Masafi Park Al Quoz",
        "locationPriority" : 1.0,
        "accountTypeId" : 0.0,
        "locationType" : "DEPOT",
        "location" : {
            "makani" : "",
            "lat" : 25.123091,
            "lng" : 55.21082
        },
        "deliveryDays" : "MTWRFSU",
        "timeWindow" : {
            "timeWindowTypeId" : "1"
        },
        "address1" : "",
        "address2" : "",
        "phone" : "",
        "city" : "",
        "county" : "",
        "state" : "",
        "country" : "",
        "zipcode" : "",
        "imageUrl" : "",
        "contact" : {
            "name" : "",
            "email" : ""
        },
        "status" : "",
        "createdBy" : "",
        "updatedBy" : "",
        "updateDate" : "",
        "accountId" : 1.0,
        "serviceTimeTypeId" : "1"
    }
}


{
    "_id" : ObjectId("59c3983a647101ec58ddcf91"),
    "vehicleId" : "81765",
    "regionId" : 1.0,
    "vehicleTypeId" : "10TONBOX",
    "locationId" : "100",
    "description" : "Hino/2004-10 Ton/Box",
    "deviceId" : "",
    "earliestStart" : 36000.0,
    "latestArrival" : 54000.0,
    "status" : "AVAILABLE",
    "accountId" : 1.0,
    "locations" : {
        "_id" : ObjectId("59c3afeab7799c90ebb3291f"),
        "serviceLocationId" : "100",
        "regionId" : 1.0,
        "zoneId" : "DXBZONE1",
        "description" : "Masafi Park Al Quoz",
        "locationPriority" : 1.0,
        "accountTypeId" : 0.0,
        "locationType" : "DEPOT",
        "location" : {
            "makani" : "",
            "lat" : 25.123091,
            "lng" : 55.21082
        },
        "deliveryDays" : "MTWRFSU",
        "timeWindow" : {
            "timeWindowTypeId" : "1"
        },
        "address1" : "",
        "address2" : "",
        "phone" : "",
        "city" : "",
        "county" : "",
        "state" : "",
        "country" : "",
        "zipcode" : "",
        "imageUrl" : "",
        "contact" : {
            "name" : "",
            "email" : ""
        },
        "status" : "",
        "createdBy" : "",
        "updatedBy" : "",
        "updateDate" : "",
        "accountId" : 1.0,
        "serviceTimeTypeId" : "1"
    }
}
KARTHIKEYAN.A
  • 18,210
  • 6
  • 124
  • 133
5

I have to disagree, we can make $lookup work with IDs array if we preface it with $match stage.

// replace IDs array with lookup results
db.products.aggregate([
    { $match: { products : { $exists: true } } },
    {
        $lookup: {
            from: "products",
            localField: "products",
            foreignField: "_id",
            as: "productObjects"
        }
    }
])

It becomes more complicated if we want to pass the lookup result to a pipeline. But then again there's a way to do so (already suggested by @user12164):

// replace IDs array with lookup results passed to pipeline
db.products.aggregate([
    { $match: { products : { $exists: true } } },
    {
        $lookup: {
            from: "products",
             let: { products: "$products"},
             pipeline: [
                 { $match: { $expr: {$in: ["$_id", "$$products"] } } },
                 { $project: {_id: 0} } // suppress _id
             ],
            as: "productObjects"
        }
    }
])
Liebster Kamerad
  • 6,179
  • 2
  • 19
  • 18
1

Aggregating with $lookup and subsequent $group is pretty cumbersome, so if (and that's a medium if) you're using node & Mongoose or a supporting library with some hints in the schema, you could use a .populate() to fetch those documents:

var mongoose = require("mongoose"),
    Schema = mongoose.Schema;

var productSchema = Schema({ ... });

var orderSchema = Schema({
  _id     : Number,
  products: [ { type: Schema.Types.ObjectId, ref: "Product" } ]
});

var Product = mongoose.model("Product", productSchema);
var Order   = mongoose.model("Order", orderSchema);

...

Order
    .find(...)
    .populate("products")
    ...
Arc
  • 11,143
  • 4
  • 52
  • 75
0

if you store the ids of the products in an array of strings, you can use this

db.users.aggregate([
  { $match: { _id: ObjectId(userId) } },
  {
    $addFields: {
      productIds: { $map: { input: "$productids", in: { $toObjectId: "$$this" } } }
    }
  },
  {
    $lookup: {
      from: "products",
      localField: "productIds",
      foreignField: "_id",
      as: "products"
    }
  },
  { $project: { "products.name": 1, _id: 0 } }
])
bensbenj
  • 391
  • 3
  • 7
-1

Lookup is basically left join operation in relational database terms. So left document(table- in relation database terms) is the document where we are working currently and right document is the document from where we have to extract information.

Lookup->

   {
  $lookup:
     {
       from: <right document>,
       localField: <  field in left document which holds the info for right document >,
       foreignField: <field in the right document which is referenced in left document  >,
       as: <alias for array list where the result is to be stored.>
     }
}

So the right answer to above question is ->

{
"$lookup":
   {
        "from": "products",
        "localField": "products",
        "foreignField": "_id",
        "as": "productObject",
    }
}

But it wasn't working for me , I debugged and found the below stated issue: So, In my Case what I did wrong while making insert in left document was :

Wrong insert ->

product document -->

{
  "_id": ObjectId(something),
  "products":[ "some_id_1", "some_id_2"]
}

Right insert would be ->

Product Document

 {
  "_id": ObjectId(something),
  "products" : [ ObjectId("some_id_1"), ObjectId("some_id_2")]
 }

For me , I was storing the object id as string , not as in ObjectId object in the foreign key in left document. Pleasure ensure the right format while making an insert .

And in the end we all should learn from some one's mistake.

  • This does not answer the question for performing `$lookup` with ObjectId – ray May 22 '22 at 11:31
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 23 '22 at 11:09