2

I have 2 collections:

User

{
    id: 1,
    name: "Michael", 
    starred: [1, 2]
}

School

{
    id: 1,
    name: "Uni", 
    faculties: [{
        id:1000, 
        name: "faculty1", 
        subjects: [
            {id: 1, name: "sub1"},
            {id: 2, name: "sub2"},
            {id: 3, name: "sub3"}
        ]
    }]
}

Now in my users collection i'd like to lookup and collect each subjects object with the id found in starred. ie. starred: [1,2] contains the id of the subjects I want.

So the end result should return

[{id: 1, name: sub1},{id: 2, name: sub2}]

I'm currently working with whis aggregation pipeline

{$match: {name: 'Michael'}},
{$unwind: "$faculties"},
{$unwind: "$faculties.subjects"},
{$lookup:
  {
     from: 'schools',
     localField: 'starred',
     foreignField: 'faculties.subjects.id',
     as: 'starredSubjects'
   }
},
{$project: {starredSubjects: 1}}

but the unwind doesnt work (i guess because im trying to unwind a foreign collection, not the local one (which is Users). Also the foreignField: 'faculties.subjects.id doesnt return anything. What am i missing?

(sidenote: testing is great on the MongoExplorer webstorm plugin).

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
mewc
  • 135
  • 10

1 Answers1

6

This really is not a great structure to work with, and there are very good reasons for that. As such, doing a $lookup here is not a simple task as there are various implications of the "nested arrays"

You basically want either

db.users.aggregate([
   { "$match": { "name": "Michael" } },
   { "$lookup": {
     "from": "schools",
     "localField": "starred",
     "foreignField": "faculties.subjects.id",
     "as": "subjects"
   }},
   { "$addFields": {
     "subjects": {
       "$filter": {
         "input": {
           "$reduce": {
             "input": {
               "$reduce": {
                 "input": "$subjects.faculties.subjects",
                 "initialValue": [],
                 "in": { "$concatArrays": [ "$$value", "$$this" ] }
               }
             },
             "initialValue": [],
             "in": { "$concatArrays": [ "$$value", "$$this" ] }
           }
         },
         "cond": { "$in": ["$$this.id", "$starred"] }
       }
     }
   }}
])

Or with MongoDB 3.6 or greater maybe instead:

db.users.aggregate([
  { "$match": { "name": "Michael" } },
  { "$lookup": {
    "from": "schools",
    "let": { "starred": "$starred" },
    "pipeline": [
      { "$match": {
        "$expr": {
          "$setIsSubset": [ 
            "$$starred",
            { "$reduce": {
              "input": "$faculties.subjects.id",
              "initialValue": [],
              "in": { "$concatArrays": [ "$$value", "$$this" ] }
            }}
          ]
        }
      }},
      { "$project": {
        "_id": 0,
        "subjects": {
          "$filter": {
            "input": {
              "$reduce": {
                "input":  "$faculties.subjects",
                "initialValue": [],
                "in": { "$concatArrays": [ "$$value", "$$this" ] }
              }
            },
            "cond": { "$in": [ "$$this.id", "$$starred" ] }
          }
        }
      }},
      { "$unwind": "$subjects" },
      { "$replaceRoot": { "newRoot": "$subjects" } }
    ],
    "as": "subjects"
  }}
])

Both approaches essentially rely on $reduce and $concatArrays in order to "flatten" the "nested array" content into a form that can be used for comparison. The main difference between the two being that prior to MongoDB 3.6 you are essentially pulling all the "possible" matches from the document before you can do anything about "filtering" the inner array entries to only those that match.

Failing having at least MongoDB 3.4 with the $reduce and $in operators, then you are essentially resorting to $unwind:

db.users.aggregate([
   { "$match": { "name": "Michael" } },
   { "$lookup": {
     "from": "schools",
     "localField": "starred",
     "foreignField": "faculties.subjects.id",
     "as": "subjects"
   }},
   { "$unwind": "$subjects" },
   { "$unwind": "$subjects.faculties" },
   { "$unwind": "$subjects.faculties.subjects" },
   { "$redact": {
      "$cond": {
        "if": {
          "$setIsSubset": [
            ["$subjects.faculties.subjects.id"],
            "$starred"
          ]
        },
        "then": "$$KEEP",
        "else": "$$PRUNE"
      }
   }},
   { "$group": {
      "_id": "$_id",
      "id": { "$first": "$id" },
      "name": { "$first": "$name" },
      "starred": { "$first": "$starred" },
      "subjects": { "$push": "$subjects.faculties.subjects" }
   }}
])

Using of course the $redact stage to filter the logical comparison since there is only $expr with MongoDB 3.6 and $setIsSubset to compare to an array of "starred".

Then of course due to all the $unwind operations, you typically want to $group in order to reform the array.

Or otherwise do the $lookup from the other direction:

db.schools.aggregate([
  { "$unwind": "$faculties" },
  { "$unwind": "$faculties.subjects" },
  { "$lookup": {
    "from": "users",
    "localField": "faculties.subjects.id",
    "foreignField": "starred",
    "as": "users"
  }},
  { "$unwind": "$users" },
  { "$match": { "users.name": "Michael" } },
  { "$group": {
    "_id": "$users._id",
    "id": { "$first": "$users.id" },
    "name": { "$first": "$users.name" },
    "starred": { "$first": "$users.starred" },
    "subjects": {
      "$push": "$faculties.subjects"
    }    
  }}
])

The last form not really being ideal since you don't filter the "users" until after the $lookup is done ( or technically speaking "during" the $lookup really ). But at any rate, it needs to work with the whole "schools" collection first.

All forms return the same output:

{
    "_id" : ObjectId("5aea649526a94676bb981df4"),
    "id" : 1,
    "name" : "Michael",
    "starred" : [
            1,
            2
    ],
    "subjects" : [
        {
                "id" : 1,
                "name" : "sub1"
        },
        {
                "id" : 2,
                "name" : "sub2"
        }

    ]
}

Where you have only the detail from the "subjects" inner array from the related document which actually matched the values of "starred" for the current user.


All of that said, it's not really a good idea to "nest arrays" with MongoDB. Before MongoDB 3.6 you cannot even do atomic updates of "nested arrays" and even with the changes that allow it, it is still "difficult" at best to do any query operations, and especially those involving joins and filtering.

It's a common "newbie" mistake to structure "nested arrays" since you seem to think you are "organizing" things better. But in truth it's more of an "anti-pattern" and you really should consider a "flatter" structure such as :

{
    "_id" : ObjectId("5aea651326a94676bb981df5"),
    "id" : 1,
    "name" : "Uni",
    "subjects" : [
        {
                "id" : 1,
                "name" : "sub1",
                "facultyId": 1000,
                "facultyName": "faculty1"
        },
        {
                "id" : 2,
                "name" : "sub2",
                "facultyId": 1000,
                "facultyName": "faculty1"

        },
        {
                "id" : 3,
                "name" : "sub3",
                "facultyId": 1000,
                "facultyName": "faculty1"

        }
    ]
}

Which is "much" easier to work with and of course perform "joins" on where required.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • This is a perfect answer. Well detailed and gave a solution to our problem, proving the source of the problem is structuring it all wrong to start with. We weren't familiar with `$lookup` on starting with the structure and you certainly called us out in saying that it was a **newbie** mistake. We've since split everything into their own collection: schools, faculties, subjects and making use of $lookup for instance, and making changed is doable now. It's making things much more managable. Thank you Neil, I really appreciate it. – mewc May 04 '18 at 00:15