5

Here is my mongo collection 'sales':

{"title":"Foo", "hash": 17, "num_sold": 49, 
"place": "ABC"}

{"title":"Bar", "hash": 18, "num_sold": 55, 
"place": "CDF"}

{"title":"Baz", "hash": 17, "num_sold": 55,
"place": "JKN"}

{"title":"Spam", "hash": 17, "num_sold": 20,
"place": "ZSD"}

{"title":"Eggs", "hash": 18, "num_sold": 20, 
"place": "ZDF"}

I would like to group by hash and return document with the greatest "num_sold". So as output I would like to see:

{"title":"Baz", "hash": 17, "num_sold": 55,
    "place": "JKN"}

 {"title":"Bar", "hash": 18, "num_sold": 55, 
    "place": "CDF"}

I know basic of aggregate operator and here is how I would group and get maximum of num_sold, but I need whole document corresponding to maximum, not just the value.

db.getCollection('sales').aggregate([
{$group: {_id: "$hash", max_sold : {$max: '$value'}}}
])

In SQL I would have done it with join, but in mongo. I also read that in mongo group and sort do not work well together.

ekad
  • 14,436
  • 26
  • 44
  • 46
user1700890
  • 7,144
  • 18
  • 87
  • 183
  • What version of `mongodb` are you using? – BatScream Dec 21 '15 at 21:56
  • @BatScream 3.0.4. I am still researching my question and it looks like it is possible to sort within a group on client side, but I don't understand the difference between server and client side. Also seems like I can accomplish the same with mapreduce, but not yet clear how. – user1700890 Dec 21 '15 at 22:01
  • The expensive `$sort` and `$group` approach can be avoided. Please see my answer below to get it done on the server side. – BatScream Dec 21 '15 at 22:13

2 Answers2

6

You can use the $redact stage to accomplish this. It avoids the usage of $sort and then again doing a $group or an $unwind.

  • $group by _id and get the maximum max_num_sold for each group, accumulate all the documents in the group using the $push operator.
  • $redact into sub documents per group, keeping only those which have the maximum max_num_sold in their num_sold

sample code:

db.getCollection('sales').aggregate([
{$group:{"_id":"$hash",
         "max_num_sold":{$max:"$num_sold"},
         "records":{$push:"$$ROOT"}}},
{$redact:{$cond:[{$eq:[{$ifNull:["$num_sold","$$ROOT.max_num_sold"]},
                       "$$ROOT.max_num_sold"]},
                "$$DESCEND","$$PRUNE"]}},
])

test data:

db.getCollection('sales').insert([
{"title":"Foo","hash":17,"num_sold":49,"place":"ABC"},
{"title":"Bar","hash":18,"num_sold":55,"place":"CDF"},
{"title":"Baz","hash":17,"num_sold":55,"place":"JKN"},
{"title":"Spam","hash":17,"num_sold":20,"place":"ZSD"},
{"title":"Eggs","hash":18,"num_sold":20,"place":"ZDF"}
])

test result:

{
        "_id" : 18,
        "max_num_sold" : 55,
        "records" : [
                {
                        "_id" : ObjectId("567874f2b506fc2193a22696"),
                        "title" : "Bar",
                        "hash" : 18,
                        "num_sold" : 55,
                        "place" : "CDF"
                }
        ]
}
{
        "_id" : 17,
        "max_num_sold" : 55,
        "records" : [
                {
                        "_id" : ObjectId("567874f2b506fc2193a22697"),
                        "title" : "Baz",
                        "hash" : 17,
                        "num_sold" : 55,
                        "place" : "JKN"
                }
        ]
}
BatScream
  • 19,260
  • 4
  • 52
  • 68
  • Thank you very much for such a detailed answer! It took me awhile to understand. This is very creative use of aggregation framework. Do you know what it means to do something on serve side or on client side? I want to fully research the question and try it with map reduce also. – user1700890 Dec 24 '15 at 15:04
  • 1
    Yes but you do a costly operation that is $sort. But then again there are always ways of doing it. I gave you a different approach to do it:) – BatScream Dec 24 '15 at 16:05
  • Sorry, I posted alternative solution with costly sort and group separately for better readability. – user1700890 Dec 24 '15 at 16:08
2

It looks like grouping in mongodb does not distort the order and something like this is possible:

mongodb, how to aggregate with group by and sort correctly.

In particular, for the above example we can get the following:

db.getCollection('sales').aggregate([
{$sort: {"num_sold":-1}},
{$group:{"_id": "$hash",
         "max_num_sold" : {$first:"$num_sold"},
         "title":{$first: "$title"},
         "place":{$first:"$place"}
         }}
])

Here is output:

{
    "result" : [ 
        {
            "_id" : 17.0000000000000000,
            "max_num_sold" : 55.0000000000000000,
            "title" : "Baz",
            "place" : "JKN"
        }, 
        {
            "_id" : 18.0000000000000000,
            "max_num_sold" : 55.0000000000000000,
            "title" : "Bar",
            "place" : "CDF"
        }
    ],
    "ok" : 1.0000000000000000
}
Community
  • 1
  • 1
user1700890
  • 7,144
  • 18
  • 87
  • 183