1

Say there is one table which contains documents like this:

{ _id: "aaaa", name: "One" }
{ _id: "bbbb", name: "Two" }

And a foreign table which contains documents like this:

{ _id: "cccc", state: "pending",  source_id: "aaaa" }
{ _id: "dddd", state: "finished", source_id: "aaaa" }
{ _id: "eeee", state: "finished", source_id: "aaaa" }
{ _id: "ffff", state: "pending",  source_id: "aaaa" }

Now, I want to get a single array like this:

[
    { _id: "aaaa", name: "One", pending: 2 },
    { _id: "bbbb", name: "Two", pending: 0 }
]

The resulting documents are from the first table. Aggregation fills in the "pending" field with the number of matching (source_id = _id) foreign documents that have a status of "pending".

How would one do this? What would be the correct query syntax?

One solution I've tried: using $lookup, $match, and $sort by themselves to retrieve a result like this:

{
  _id: "aaaa",
  name: "One",
  foreign: [
    { _id: "cccc", state: "pending",  source_id: "aaaa" },
    { _id: "ffff", state: "pending",  source_id: "aaaa" }
  ]
},
{ 
  _id: "bbbb",
  name: "Two"
  foreign: []
}

I'm able to just get the length of the child array in my code, but when the foreign table has too many documents, I end up getting "aggregation exceeds maximum document size." because the result is more than 16MB.


Making Progress

The previous has helped. However I'm trying to figure out how to filter pending states, as the original question states:

db.apps.aggregate([
  {
    "$lookup": {
      "from": "requests",
      "localField": "_id",
      "foreignField": "app_id",
      "as": "foreign"
    }
  },
  {
    "$unwind": "$foreign"
  },
  {
    $match: {
      "foreign.state": 0
    }
  },
  {
    "$group": {
      "_id": "$_id",
      "name": {
        "$first": "$name"
      },
      "pending": {
        "$sum": 1
      }
    }
  }
])

I get some results. However, documents in the local/first table that have no associated foreign documents (no foreign documents with state = 0) are not returned at all. I want rows that have 0 pending foreign documents to still be included in the results.


The Solution

db.apps.aggregate([{
    "$lookup": {
      "from": "requests",
      "as": "pending",
      "let": {
        "id": "$_id"
      },
      "pipeline": [{
          "$match": {
            "$expr": {
              "$eq": ["$$id", "$app_id"]
            }
          }
        },
        {
          "$match": {
            "$expr": {
              "$eq": [0, "$state"]
            }
          }
        },
        {
          "$count": "count"
        }
      ]
    }
  },
  {
    "$addFields": {
      "pending": {
        "$arrayElemAt": ["$pending.count", 0]
      }
    }
  }
])
afollestad
  • 2,929
  • 5
  • 30
  • 44

1 Answers1

1

You need to $unwind and essentially reduce

db.localcollection.aggregate([
  { "$lookup": {
    "from": "foreigncollection",
    "localField": "_id",
    "foreignField": "source_id",
    "as": "foreign"
  }},
  { "$unwind": "$foreign" },
  { "$group": {
    "_id": "$_id",
    "name":  { "$first": "$name" },
    "pending": { "$sum": 1 }
  }}
])

Or with MongoDB 3.6, reduce in the inner pipeline with the expressive $lookup:

db.localcollection.aggregate([
  { "$lookup": {
    "from": "foreigncollection",
    "as": "pending",
    "let": { "id": "$_id" },
    "pipeline": [
      { "$match": { "$expr": { "$eq": [ "$$id", "$source_id" ] },
      { "$count": "count" }
    ]
  }},
  { "$addFields": { "pending": { "$arrayElemAt": [ "$pending.count", 0 ] } } }
])

If there is anything else need to "reduce matches" on the foreign collection by criteria then this happens as a $match directly after $unwind or within the $match withing the "sub-pipeline" where it is possible for you to do that.

Either form is optimized to do things "before" the array is returned or even "without" returning an array in order to not break the BSON limit. There is a more detailed explanation of what happens with pipeline optimization and technique at Aggregate $lookup Total size of documents in matching pipeline exceeds maximum document size, which shows by example even deliberately breaking that limit and then avoiding it.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317