6

I want to fetch "all the documents" having highest value for specific field and than group by another field.

Consider below data:

_id:1, country:india,  quantity:12,  name:xyz
_id:2, country:USA,    quantity:5,   name:abc
_id:3, country:USA,    quantity:6,   name:xyz
_id:4, country:india,  quantity:8,   name:def
_id:5, country:USA,    quantity:10,  name:jkl
_id:6, country:india,  quantity:12,  name:jkl

Answer should be

country:india max-quantity:12
name xyz
name jkl 

country:USA max-quantity:10
name jkl

I have tried several queries, but I can get only the max value without the name or i can go group by but it shows all the values.

db.coll.aggregate([{
    $group:{
        _id:"$country",
        "maxQuantity":{$max:"$quantity"}
    }
}])

for example above will give max quantity on every country but how to combine with other field such that it shows all the documents of max quantity.

Amnon
  • 2,212
  • 1
  • 19
  • 35

4 Answers4

12

If you want to keep document information, then you basically need to $push it into an array. But of course, then having your $max values, you need to filter the contents of the array for just the elements that match:

db.coll.aggregate([
    { "$group":{ 
        "_id": "$country",
        "maxQuantity": { "$max": "$quantity" },
        "docs": { "$push": {
            "_id": "$_id",
            "name": "$name",
            "quantity": "$quantity"
        }}
    }},
    { "$project": {
        "maxQuantity": 1,
        "docs": {
            "$setDifference": [
               { "$map": {
                   "input": "$docs",
                   "as": "doc",
                   "in": {
                       "$cond": [ 
                           { "$eq": [ "$maxQuantity", "$$doc.quantity" ] },
                           "$$doc",
                           false
                       ]
                   }
               }},
               [false]
            ]
        }
    }}
])

So you store everything in an array and then test each array member to see if it's value matches the one that was recorded as the maximum, discarding any that do not.

I'd keep the _id values in the array documents since that is what makes them "unique" and won't be adversely affected by $setDifference when filtering out values. But of course if "name" is always unique then it won't be required.

You can also just return whatever fields you want from $map, but I'm just returning the whole document for example.

Keep in mind that this has the limitation of not exceeding the BSON size limit of 16MB, so is okay for small data samples, but anything producing a potentially large list ( since you cannot pre-filter array content ) would be better of processed with a separate query to find the "max" values, and another to fetch the matching documents.

Blakes Seven
  • 49,422
  • 14
  • 129
  • 135
  • Your solution works perfectly but Its some what complicated for a beginner like me as I am unaware of $map , $setDifference ... still I will try to understand and implement it. But is there a simpler way to achieve this? –  Oct 27 '15 at 15:44
  • @ViyatGandhi Simple looking Yes, use `$unwind` and filter out the array with a `$match` before grouping again, or just conditionally `$sum`. But it's not a "performant" way so you should stick with this. Future MongoDB releases will have a `$filter` operator which simplifies this a little. The key is understanding what `$map` is doing, so look at the linked documentation and practice some samples. As stated, larger grouped results should really be separate queries anyway. – Blakes Seven Oct 28 '15 at 04:35
  • This is extremelly complex. Is there any alternative syntax? less verbose maybe? – Danielo515 May 12 '16 at 07:39
  • 2
    Not sure why this is accepted answer instead of the other one - the correct answer is just $sort and $group taking $first. This is unnecessarily complex AND it won't scale at all. – Asya Kamsky Feb 12 '20 at 03:04
4

I know how to do similar task simpler only if you alter specific range of countries:

[
{"$match":{"name":{"$in":["USA","india"]}}}, // stage one
{ "$sort": { "quanity": -1 }}, // stage three
{"$limit":2 } // stage four - count equal ["USA","india"] length
]

If you need all countries try follow, but without guaranties from me:

[
{"$project": {
    "country": "$country",
    "quantity": "$quantity",
    "document": "$$ROOT" // save all fields for future usage

}},
{ "$sort": { "quantity": -1 }},
{"$group":{"_id":{"country":"$country"},"original_doc":{"$first":"$document"} }}
]
basil
  • 3,482
  • 2
  • 22
  • 20
2

Another way can be like:

db.coll.aggregate(
[
  {
    $sort:{ country: -1, "quantity": -1 }
  },
  {
    "$group":
    {
      "_id":{ "country": "$country" },
      "data":{ "$first": "$$ROOT" } 
    }
  }
])
Gellio Gao
  • 835
  • 6
  • 19
krishna
  • 21
  • 1
0

Another possibility close to Blakes Seven's solution to simplify a bit the setDifference + map part by a filter of the array of documents.

db.coll.aggregate([
    { "$group":{ 
        "_id": "$country",
        "maxQuantity": { "$max": "$quantity" },
        "docs": { "$push": {
            "_id": "$_id",
            "name": "$name",
            "quantity": "$quantity"
        }}
    }},
    { "$project": {
        "maxQuantity": 1,
        "docs": {
            "$filter": {
              "input": "$docs",
              "as": "doc",
              "cond": { $eq: ["$$doc.quantity", "$maxQuantity"] }
            }
        }
    }}
])
Sylvain K.
  • 56
  • 3