How to calculate a sum of specific documents using aggregation?

Question

I have a schema representing a message thread. So each document in the mongo database looks something like:

{
    id: "thread_id",
    participants: ["user1", "user2"],
    unReadMessageCounts: [
         {
             participant: "user1",
             count: 5
         },
         {
             participant: "user2",
             count: 3
         }
}

What I want to do is get a sum of all unread messages counts for a given user - say, "user2". I know I could do this by just doing a find() on the collection and then writing a function to sum up to counts for a given user. But I'd like to use mongo's aggregate functionality if possible. I know I can do a match to first select all threads in which "user2" is a participant, but then how do I construct the group and/or sum expressions to pull out the right field from the document?

Fair point. In my case the number of documents to begin with is not terribly large so the slight increased brevity of @chridam answer made it preferable for me. But I also upvoted your answer since it is indeed a perfectly sound solution. — rmacqueen, Nov 10 '15 at 18:42

chridam · Accepted Answer · 2015-11-05T20:01:15.923

Use the following aggregation pipeline to get the desired result. The initial step filters out the incoming documents to only accept "user2" participant by way of the $match operator.

The preceding pipeline stage then "denormalizes" the unReadMessageCounts array through the $unwind operator that outputs 2 documents from the array for each incumbent document (in your above sample data).

Further filtering is necessary to aggregate data for the correct participant and this is done through another $match pipeline step.

The final aggregation operation using $group specifies a group _id of null, calculating the total counts for all documents in the pipeline using the accumulator operator $sum on the "unReadMessageCounts.count" field.

So, running this aggregation pipeline on the sample data given:

db.collection.aggregate([
    {
        "$match": { "unReadMessageCounts.participant": "user2" }
    },
    { "$unwind" : "$unReadMessageCounts" },
    {
        "$match": { "unReadMessageCounts.participant": "user2" }
    },    
    {
        "$group": {
            "_id": null,
            "total": { "$sum": "$unReadMessageCounts.count"  }
        }
    }
])

will yield the result:

/* 0 */
{
    "result" : [ 
        {
            "_id" : null,
            "total" : 3
        }
    ],
    "ok" : 1
}

While this solved the problem it's not the way to go because of the size of the documents to process after the first `$unwind` and as you know this can be a very big problem if you are dealing with large collection. — styvane, Nov 10 '15 at 21:36

score 1 · Answer 2 · edited May 23 '17 at 10:27

You can use the $redact operator to as shown here to limit the size of documents to process in the pipeline then you $unwind your documents and in the $group stage you use the $sum accumulator operator to return total of unread message for "user2".

db.collection.aggregate([
    { "$match": { 
        "unReadMessageCounts": { 
            "$elemMatch": { "participant": "user2" }
        }
    }},
    { "$redact": {
        "$cond": [
            { "$or": [
                { "$eq": [ "$participant", "user2" ] }, 
                { "$not" : "$participant" } 
            ]}, 
            "$$DESCEND", 
            "$$PRUNE" 
        ]
    }}, 
    { "$unwind": "$unReadMessageCounts" },
    { "$group": { 
        "_id": null, 
        "total": { "$sum": "$unReadMessageCounts.count" }
    }}
])

How to calculate a sum of specific documents using aggregation?

2 Answers2