2

I have a schema representing a message thread. So each document in the mongo database looks something like:

{
    id: "thread_id",
    participants: ["user1", "user2"],
    unReadMessageCounts: [
         {
             participant: "user1",
             count: 5
         },
         {
             participant: "user2",
             count: 3
         }
}

What I want to do is get a sum of all unread messages counts for a given user - say, "user2". I know I could do this by just doing a find() on the collection and then writing a function to sum up to counts for a given user. But I'd like to use mongo's aggregate functionality if possible. I know I can do a match to first select all threads in which "user2" is a participant, but then how do I construct the group and/or sum expressions to pull out the right field from the document?

Philipp
  • 67,764
  • 9
  • 118
  • 153
rmacqueen
  • 971
  • 2
  • 8
  • 22
  • Fair point. In my case the number of documents to begin with is not terribly large so the slight increased brevity of @chridam answer made it preferable for me. But I also upvoted your answer since it is indeed a perfectly sound solution. – rmacqueen Nov 10 '15 at 18:42

2 Answers2

2

Use the following aggregation pipeline to get the desired result. The initial step filters out the incoming documents to only accept "user2" participant by way of the $match operator.

The preceding pipeline stage then "denormalizes" the unReadMessageCounts array through the $unwind operator that outputs 2 documents from the array for each incumbent document (in your above sample data).

Further filtering is necessary to aggregate data for the correct participant and this is done through another $match pipeline step.

The final aggregation operation using $group specifies a group _id of null, calculating the total counts for all documents in the pipeline using the accumulator operator $sum on the "unReadMessageCounts.count" field.

So, running this aggregation pipeline on the sample data given:

db.collection.aggregate([
    {
        "$match": { "unReadMessageCounts.participant": "user2" }
    },
    { "$unwind" : "$unReadMessageCounts" },
    {
        "$match": { "unReadMessageCounts.participant": "user2" }
    },    
    {
        "$group": {
            "_id": null,
            "total": { "$sum": "$unReadMessageCounts.count"  }
        }
    }
])

will yield the result:

/* 0 */
{
    "result" : [ 
        {
            "_id" : null,
            "total" : 3
        }
    ],
    "ok" : 1
}
chridam
  • 100,957
  • 23
  • 236
  • 235
  • While this solved the problem it's not the way to go because of the size of the documents to process after the first `$unwind` and as you know this can be a very big problem if you are dealing with large collection. – styvane Nov 10 '15 at 21:36
1

You can use the $redact operator to as shown here to limit the size of documents to process in the pipeline then you $unwind your documents and in the $group stage you use the $sum accumulator operator to return total of unread message for "user2".

db.collection.aggregate([
    { "$match": { 
        "unReadMessageCounts": { 
            "$elemMatch": { "participant": "user2" }
        }
    }},
    { "$redact": {
        "$cond": [
            { "$or": [
                { "$eq": [ "$participant", "user2" ] }, 
                { "$not" : "$participant" } 
            ]}, 
            "$$DESCEND", 
            "$$PRUNE" 
        ]
    }}, 
    { "$unwind": "$unReadMessageCounts" },
    { "$group": { 
        "_id": null, 
        "total": { "$sum": "$unReadMessageCounts.count" }
    }}
])
Community
  • 1
  • 1
styvane
  • 59,869
  • 19
  • 150
  • 156