4

TLDR

How do I use MongoDB aggregation to include related documents from another collection that is linked by a one to many relationship?

In essence, what I want to do is to be able to fetch a list of Questions and include all flags associated with that question.

Update (11/07/2016): Solved with the solution posted below.

Update (05/07/2016): I have somewhat managed to get a list of questions with their associated flags by using a combination of $unwind, $lookup, $project etc. The updated query is below.

Problem (05/07/2016): I am only able to fetch questions that have nested flags. I want to fetch all questions even if they do not have any flags.

I have two collections, one for content and one for content flags, as follows:

The schema for the content (question collection)

{
    "_id" : ObjectId("..."),
    "slug" : "a-sample-title",
    "content" : "Some content.",
    "title" : "A Sample Title.",
    "kind" : "Question",
    "updated" : ISODate("2016-06-08T08:54:26.104Z"),
    "isPublished" : true,
    "isFeatured" : false,
    "flags" : [ 
        ObjectId("<id_of_flag_one>"), 
        ObjectId("<id_of_flag_two>")
    ],
    "answers" : [ 
        ObjectId("..."), 
        ObjectId("...")
    ],
    "related" : [],
    "isAnswered" : true,
    "__v" : 4
}

The shcema for the flags (flags collection)

{
    "_id" : ObjectId("..."),
    "flaggedBy" : ObjectId("<a_users_id>"),
    "type" : "like",
    "__v" : 0
}

In the above, a question can have many flags and a flag can only have one question. What I want to do is return all flags for a question when I query the question collection. I have tried doing this using aggregation with some luck.

Here is the updated query that I am using (05/07/2016)

fetchQuestions: (permission, params) => {
    return new Promise((resolve, reject) => {
        let query = Question.aggregate([
            {
                $lookup: {
                    from: 'users',
                    localField: 'author',
                    foreignField: '_id',
                    as: 'authorObject'
                }
            },
            {
                $unwind: '$authorObject'
            },
            {
                $unwind: '$flags'
            },
            {
                $lookup: {
                    from: 'flags',
                    localField: 'flags',
                    foreignField: '_id',
                    as: 'flagObjects'
                }
            },
            {
                $unwind: '$flagObjects'
            },
            {
                $group: {
                    _id: {
                        _id: '$_id',
                        title: '$title',
                        content: '$content',
                        updated: '$updated',
                        isPublished: '$isPublished',
                        isFeatured: '$isFeatured',
                        isAnswered: '$isAnswered',
                        answers: '$answers',
                        author: '$authorObject'
                    },
                    flags: {
                        $push: '$flags'
                    },
                    flagObjects: {
                        $push: '$flagObjects'
                    }
                }
            },
            {
                $project: {
                    _id: 0,
                    _id: '$_id._id',
                    title: '$_id.title',
                    content: '$_id.content',
                    updated: '$_id.updated',
                    isPublished: '$_id.isPublished',
                    isFeatured: '$_id.isFeatured',
                    author: {
                        fullname: '$_id.author.fullname',
                        username: '$_id.author.username'
                    },
                    flagCount: {
                        $size: '$flagObjects'
                    },
                    answersCount: {
                        $size: '$_id.answers'
                    },
                    flags: '$flagObjects',
                    wasFlagged: {
                        $cond: {
                            if: {
                                $gt: [
                                    {
                                        $size: '$flagObjects'
                                    },
                                    0
                                ]
                            },
                            then: true,
                            else: false
                        }
                    }
                }
            },
            {
                $sort: {
                    updated: 1
                }
            },
            {
                $skip: 0
            },
            // {
            //     $limit: 110
            // }
        ])
        .exec((error, result) => {
            if(error) reject(error);
            else resolve(result);
        });
    });
},

I have tried using other aggregation operators like $unwind and $group but the result set comes back with five items or less, and I am finding it difficult to grasp the concept of how these should all work together to get me what I need.

This is the response I am getting and it is exactly what I need. The only problem is, as described above, that I am only getting questions that have flags and not all questions.

"questions": [
{
  "_id": "5757dd42d0c2ae292f76f11a",
  "flags": [
    {
      "_id": "5774e0a81f2874821f71ace8",
      "flaggedBy": "57569d02d0c2ae292f76f0f5",
      "type": "concern",
      "__v": 0
    },
    {
      "_id": "577a0f5414b834372a6ac772",
      "flaggedBy": "5756aa79d0c2ae292f76f0f8",
      "type": "concern",
      "__v": 0
    }
  ],
  "title": "A question for the landing page.",
  "content": "This is a question that will appear on the landing page.",
  "updated": "2016-06-08T08:54:26.104Z",
  "isPublished": true,
  "isFeatured": false,
  "author": {
    "fullname": "Matt Finucane",
    "username": "matfin-386829"
  },
  "flagCount": 2,
  "answersCount": 2,
  "wasFlagged": true
},
...,
...,
...
]
matfin
  • 499
  • 4
  • 15
  • If your MongoDB version is 3.2 or newer then [`$lookup`](https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/) – styvane Jun 30 '16 at 20:40
  • Hmm not sure I understand. I am using $lookup in the aggregation query. – matfin Jun 30 '16 at 21:53
  • I can also confirm that I am using MongoDB version 3.2.x – matfin Jul 01 '16 at 07:55
  • Why do you have two `$lookup` stage in there? Remove the first `$lookup` stage. Also [`$lookup` does not play well with array field in 3.2](https://stackoverflow.com/questions/34967482/lookup-on-objectids-in-an-array). – styvane Jul 01 '16 at 08:16
  • The first $lookup is used to fetch the author of the question so I can use it in my projection later on. From my understanding, this is similar to using something like the populate() function in Mongoose. I amended the aggregated query with the suggestions inside the link you posted and now I have a list of 5 flag objects returned. My question projection is ignored now. How can I write this so that I *nest* the flagObjects inside the result set for my query without changing the output of the query. Am I going down the correct path when using aggregation ? – matfin Jul 01 '16 at 08:43
  • The more I think of this and read into it, I think the approach I am trying to take is completely wrong. Will research other ways to do this using Mongoose and report back with any solutions I find. – matfin Jul 01 '16 at 11:27
  • There is not between the "foreignField" and the "localField" in the document you've show. That being said the flag document is `{ "_id" : ObjectId("5774e0a81f2874821f71ace8"), "flaggedBy" : ObjectId("57569d02d0c2ae292f76f0f5"), "type" : "concern", "__v" : 0 }`, the following query gives you the expected result. `db.contents.aggregate({"$unwind": "$flags" }, {"$lookup": { "from": "flags", "localField": "flags", "foreignField": "_id", "as": "flagObjects" } })` – styvane Jul 01 '16 at 11:46
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/116187/discussion-between-user3100115-and-matfin). – styvane Jul 01 '16 at 11:46

1 Answers1

6

It looks like I have figured out a solution to this problem which will be posted below.

An overview of the problem I was having was as follows:

  • I have a collection of Questions with various fields such as title, content, date of posting etc on top of the usual ObjectID field.

  • I have a separate collection of Flags that are related to a question.

  • When a Flag is posted for a Question, the ObjectID of that Flag should be added to an array field called flags attached to the Question document.

  • In short, Flags are not stored directly inside the Question document. A reference to the Flag is stored as an ObjectID.

What I needed to do was fetch all items from the Questions collection and include the related flags.

The MongoDB aggregation framework seemed to the ideal solution for this, but getting your head around it can be a bit tricky, especially when dealing with the $group, $lookup and $unwind operators.

I should also point out that I am using NodeJS v6.x.x and Mongoose 4.4.x.

This is the (rather large) commented solution to the problem.

fetchQuestions: (permission, params) => {
    return new Promise((resolve, reject) => {
        let query = Question.aggregate([
            /**
             *  We need to perform a lookup on the author 
             *  so we can include the user details for the 
             *  question. This lookup is quite easy to handle 
             *  because a question should only have one author.
             */
            {
                $lookup: {
                    from: 'users',
                    localField: 'author',
                    foreignField: '_id',
                    as: 'authorObject'
                }
            },
            /**
             *  We need this so that the lookup on the author
             *  object pulls out an author object and not an
             *  array containing one author. This simplifies
             *  the process of $project below.
             */
            {
                $unwind: '$authorObject'
            },
            /**
             *  We need to unwind the flags field, which is an 
             *  array of ObjectIDs. At this stage of the aggregation 
             *  pipeline, questions will be repeated so for example 
             *  if there are two questions and one of them has two 
             *  flags and the other has four flags, the result set 
             *  will have six items and the questions will be repeated
             *  the same number of times as the flags they contain.
             *  The $group function later on will take care of this 
             *  and return only unique questions.
             *
             *  It is important to point out how the $unwind function 
             *  is used here. If we did not specify the preserveNullAndEmptyArrays
             *  parameter then the only questions returned would be those
             *  that have flags. Those without would be skipped.
             */
            {
                $unwind: {
                    path: '$flags',
                    preserveNullAndEmptyArrays: true
                }
            },
            /**
             *  Now that we have the ObjectIDs for the flags from the 
             *  $unwind operation above, we need to perform a lookup on
             *  the flags collection to get our flags. We return these 
             *  with the variable name 'flagObjects' we can use later.
             */
            {
                $lookup: {
                    from: 'flags',
                    localField: 'flags',
                    foreignField: '_id',
                    as: 'flagObjects'
                }
            },
            /**
             *  We then need to perform another unwind on the 'flagObjects' 
             *  and pass them into the next $group function
             */
            {
                $unwind: {
                    path: '$flagObjects',
                    preserveNullAndEmptyArrays: true
                }
            },
            /**
             *  The next stage of the aggregation pipeline takes all 
             *  the duplicated questions with their flags and the flagObjects
             *  and normalises the data. The $group aggregator requires an _id
             *  property to describe how a question should be unique. It also sets
             *  up some variables that can be used when it comes to the $project
             *  stage of the aggregation pipeline.
             *  the flagObjects property calls on the $push function to add a collection
             *  of flagObjects that were pulled from the $lookup above.
             */
            {
                $group: {
                    _id: {
                        _id: '$_id',
                        title: '$title',
                        content: '$content',
                        updated: '$updated',
                        isPublished: '$isPublished',
                        isFeatured: '$isFeatured',
                        isAnswered: '$isAnswered',
                        answers: '$answers',
                        author: '$authorObject'
                    },
                    flagObjects: {
                        $push: '$flagObjects'
                    }
                }
            },
            /**
             *  The $project stage of the pipeline then puts together what the final 
             *  result set should look like when the query is executed. Here we can use
             *  various Mongo functions to reshape the data and create new attributes.
             */
            {
                $project: {
                    _id: 0,
                    _id: '$_id._id',
                    title: '$_id.title',
                    updated: '$_id.updated',
                    isPublished: '$_id.isPublished',
                    isFeatured: '$_id.isFeatured',
                    author: {
                        fullname: '$_id.author.fullname',
                        username: '$_id.author.username'
                    },
                    flagCount: {
                        $size: '$flagObjects'
                    },
                    answersCount: {
                        $size: '$_id.answers'
                    },
                    flags: '$flagObjects',
                    wasFlagged: {
                        $cond: {
                            if: {
                                $gt: [
                                    {
                                        $size: '$flagObjects'
                                    },
                                    0
                                ]
                            },
                            then: true,
                            else: false
                        }
                    }
                }
            },
            /**
             *  Then we can sort, skip and limit if needs be.
             */
            {
                $sort: {
                    updated: -1
                }
            },
            {
                $skip: 0
            },
            // {
            //     $limit: 110
            // }
        ]);

        query.exec((error, result) => {
            if(error) reject(error);
            else resolve(result);
        });
    });
},

And this is a sample of what is returned

"questions": [
    {
      "_id": "576a85d68c4333a017083fca",
      "title": "How do I do this?",
      "updated": "2016-06-22T12:34:30.919Z",
      "isPublished": false,
      "isFeatured": false,
      "author": {
        "fullname": "Matt Finucane",
        "username": "matfin-386829"
      },
      "flagCount": 1,
      "answersCount": 0,
      "flags": [
        {
          "_id": "5776541a2e38844428696615",
          "flaggedBy": "5756aa79d0c2ae292f76f0f8",
          "type": "concern",
          "__v": 0
        }
      ],
      "wasFlagged": true
    },
    {
      "_id": "576a85d68c4333a017083fc9",
      "title": "Is this a question?",
      "updated": "2016-06-22T12:34:30.918Z",
      "isPublished": true,
      "isFeatured": false,
      "author": {
        "fullname": "Matt Finucane",
        "username": "matfin-386829"
      },
      "flagCount": 2,
      "answersCount": 0,
      "flags": [
        {
          "_id": "5773ce4ea363e5161ae69e7f",
          "flaggedBy": "5756aa79d0c2ae292f76f0f8",
          "type": "concern",
          "__v": 0
        },
        {
          "_id": "577654382e3884442869661d",
          "flaggedBy": "57569d02d0c2ae292f76f0f5",
          "type": "concern",
          "__v": 0
        }
      ],
      "wasFlagged": true
    }
]
matfin
  • 499
  • 4
  • 15