0

I have a case where I want to query a collection of documents that have a number of items under a array field "forms". The problem to solve was wanting to return only the documents that have all of the documents contained in "forms" with a particular status of "closed".

So here is a sample of two different documents in the collection:

{
    "_id" : "Tvq444454j",
    "name" : "Jim",
    "forms" : [
        {
            "name" : "Jorney",
            "status" : "closed"
        },
        {
            "name" : "Women",
            "status" : "void"
        },
        {
            "name" : "Child",
            "status" : "closed"
        },
        {
            "name" : "Farm",
            "status" : "closed"
        }
    ]
},

{
    "_id" : "Tvq579754r",
    "name" : "Tom",
    "forms" : [
        {
            "name" : "PreOp",
            "status" : "closed"
        },
        {
            "name" : "Alert",
            "status" : "closed"
        },
        {
            "name" : "City",
            "status" : "closed"
        },
        {
            "name" : "Country",
            "status" : "closed"
        }
    ]
}

And the expected result:

{
    "_id" : "Tvq579754r",
    "name" : "Tom",
    "forms" : [
        {
            "name" : "PreOp",
            "status" : "closed"
        },
        {
            "name" : "Alert",
            "status" : "closed"
        },
        {
            "name" : "City",
            "status" : "closed"
        },
        {
            "name" : "Country",
            "status" : "closed"
        }
    ]
}

As there is no standard query operator to match all of the elements of the array under this condition, the solution was found by using aggregation. This would return the _id of the documents in the collection that have all of their "forms" elements set to the status of "closed".

db.forms.aggregate([
    {$unwind: "$forms" },
    {$group: { _id: "$_id", status: {$addToSet: "$forms.status" }}},
    {$unwind: "$status"},
    {$sort: { _id: 1, status: -1 }},
    {$group: {_id: "$_id", status: {$first: "$status"}}},
    {$match:{ status: "closed" }}
])

So as I would be expecting to return many documents in the results, I would like to avoid issuing another find, or series of finds just to get the documents that match the returned _id's.

Considering this, is there any way that I can get the original documents back from aggregation in exactly the same form as they are in the collection, while still doing this type of filtering?

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317

2 Answers2

7

Falling under the category of stupid aggregation tricks is a little technique that often gets overlooked.

The query doing all of it's grouping around the document _id, being the unique identifier for this document. So the main point to think of is the whole document is actually a unique identifier already. So instead of just stashing in the _id key, use the whole document.

    {$project: { 
        _id: { _id: "$_id", name: "$name", forms: "$forms" }, forms: "$forms"}
    },

Where this is done anything that is rolled up by the _id retains the document in it's original form. At the end of all other aggregation stages, issue a final $project in order to restore the true original document form:

    {$project: { _id: "$_id._id", name: "$_id.name", forms: "$_id.forms"}}

Then you will have the filtered results that you want. This technique can be very handy when used with advanced filtering such as in the case of this query, as it removes the need to issue an additional find on all of the results.

Also, in such a case where you know you are only looking for a set of results that are going to match a certain set of conditions, use a $match operator as the first stage of the aggregation pipeline. This is not only useful in reducing the working set size, but it is also the only stage at which you can make use of an index and where you can significantly increase query performance.

The whole process together:

db.forms.aggregate([
    {$match: { "forms.status": "closed" } },
    {$project: { 
        _id: { _id: "$_id", name: "$name", forms: "$forms" }, forms: "$forms"}
    },
    {$unwind: "$forms"},
    {$group: { _id: "$_id", status: {$addToSet: "$forms.status"}}},
    {$unwind: "$status"},
    {$sort: { _id: 1, status: -1} },
    {$group: { _id: "$_id", status: {$first: "$status"} }},
    {$match: { status: "closed"}},
    {$project: { _id: "$_id._id", name: "$_id.name", forms: "$_id.forms"}}
])
Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
  • There should be an `*` or something operator soon that will be able to return the original document in its full form in the aggregation pipeline – Sammaye Feb 12 '14 at 08:04
  • @Sammaye I believe you are referring to $$ROOT, which is intended for a different purpose but could be used here. That should be available in version 2.6. This is one technique that can be used **now**, and yet may have eluded many people. – Neil Lunn Feb 12 '14 at 08:32
  • 1
    Hi @NeilLunn I don't know who are you But I keep reading all your answers and learn a lot of things from them but from the time you had been blocked, I got paused and cannot learn anything. One question *"Did you just come to earn points here?"* From my thinking **NO** . You came here to help people. Blocking you is not the people fault to whom you used to do help. At least please answer the people who need you. So please either come again or help me or at least reply me. Your's faithfully. Hope you will come back again :-) – Dark Knight Oct 13 '18 at 13:05
0

You can simply use find for this:

db.collection.find({
  forms: {$not: {$elemMatch: {status: "void"}}}
})

Playground example

nimrod serok
  • 14,151
  • 2
  • 11
  • 33