2

I'm using Mongodb. Consider my next document:

{ uid: 1, created: ISODate("2014-05-02..."), another_col : "x" },
{ uid: 1, created: ISODate("2014-05-05..."), another_col : "y" },
{ uid: 2, created: ISODate("2014-05-10..."), another_col : "z" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col : "w" },
{ uid: 1, created: ISODate("2014-05-01..."), another_col : "f" },
{ uid: 2, created: ISODate("2014-05-22..."), another_col : "a" }

What I'm trying to do is a simple groupby on the uid and sorting the created by descending order so i could get the first row for each uid.

An example for an expected output

{ uid: 1, created: ISODate("2014-05-05..."), another_col: "y" },
{ uid: 2, created: ISODate("2014-05-22..."), another_col: "a" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col: "w" }

The best I could get is:

db.mycollection.aggregate( {$group: {_id: "$uid", rows: {$push: { "created" : "$created" }}}}, sort { // doesnt work well }  )

Anyone can guide me for the right combination of group by and sorting? It just doesn't work as I was expecting. (note: I have checked many threads, but I'm unable to find the correct answer for my case)

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Popokoko
  • 6,413
  • 16
  • 47
  • 58

3 Answers3

4

There are a few catches here to understand.

When you use $group the boundaries will be sorted in the order that they were discovered without either an initial or ending stage $sort operation. So if your documents were originally in an order like this:

{ uid: 1, created: ISODate("2014-05-02..."), another_col : "x" },
{ uid: 1, created: ISODate("2014-05-05..."), another_col : "y" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col : "w" },
{ uid: 2, created: ISODate("2014-05-10..."), another_col : "z" },

Then just using $group without a $sort on the end on the pipeline would return you results like this:

{ uid: 1, created: ISODate("2014-05-05..."), another_col : "y" },
{ uid: 3, created: ISODate("2014-05-05..."), another_col : "w" },
{ uid: 2, created: ISODate("2014-05-10..."), another_col : "z" },

That is one concept, but it actually seems like what you are expecting in results requires returning the "last other fields" by a sorted order of the uid is what you are looking for. In that case the way to get your result is actually to $sort first and then make use of the $last operator:

db.mycollection.aggregate([

    // Sorts everything first by _id and created
    { "$sort": { "_id": 1, "created": 1 } },

    // Group with the $last results from each boundary
    { "$group": {
        "_id": "$uid",
        "created": { "$last": "$created" },
        "another_col": { "$last": "$created" }
    }}
])

Or essentially apply the sort to what you want.

The difference between $last and $max is that the latter will choose the "highest" value for the given field within the grouping _id, regardless of the current sorted on un-sorted order. On the other hand, $last will choose the value that occurs in the same "row" as the "last" grouping _id value.


If you were actually looking to sort the values of an array then the approach is similar. Keeping the array members in "created" order you would also sort first:

db.mycollection.aggregate([

    // Sorts everything first by _id and created
    { "$sort": { "_id": 1, "created": 1 } },

    // Group with the $last results from each boundary
    { "$group": {
        "_id": "$uid",
        "row": {
            "$push": {
                "created": "$created",
                "another_col": "$another_col"
            }
        }
    }}
])

And the documents with those fields will be added to the array with the order they were already sorted by.

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
0

If all you're looking for is the first row that means you're looking for the max. Just use the built-in $max accumulator.

db.mycollection.aggregate([{$group: {_id: "$uid", rows: {$max:"$created"}}}])

You would use the $push accumulator if you needed to process all the creation dates. For more information on the accumulators see: http://docs.mongodb.org/manual/reference/operator/aggregation/group/

From your comments if you want the full documents returned, and want to be able to iterate over all the documents then you really don't need to aggregate the results. Something like this should get you what you want.

db.mycollection.find({$query:{}, $orderby:{uid:1,created:-1}})
Kassym Dorsel
  • 4,773
  • 1
  • 25
  • 52
  • Sorry for adding this next issue, but I have simplified my real situation too much, it seems like this solution doesn't give me the entire row. Assuming I had "uid", "created", "another_column", how would u get the entire row for each uid? – Popokoko May 27 '14 at 21:59
  • What are you looking to do with the data exactly. See this http://stackoverflow.com/questions/15143525/referencing-the-whole-document-in-mongodb-aggregation-pipeline – Kassym Dorsel May 27 '14 at 22:17
  • Well, I'm trying to iterate the expected type of results. But I'm still unable to resolve how should that be done.. – Popokoko May 27 '14 at 22:24
  • Updated my answer, am I correctly understanding the question ? – Kassym Dorsel May 28 '14 at 13:52
0

using $project along with this

db.mycollection.aggregate([{$group: {_id: "$uid", rows: {$max:"$created"}}}])

should help you, refer to these links

http://docs.mongodb.org/manual/reference/operator/aggregation/project/

Mongodb group and project operators

mongodb aggregation framework group + project

Community
  • 1
  • 1
Saheed Hussain
  • 1,096
  • 11
  • 12