5

I see plenty of responses to the question "how do I get all of the unique values in a field?" which suggest the .distinct() method. But this returns a simple array of those values. How do I retrieve all of the documents which HAVE a unique value of a field?

[{age: 21, name: 'bob'}, {age: 21, name: 'sally'}, {age: 30, name: 'Jim'}] 
Query for unique age -->
[{age: 21, name: 'sally'}, {age: 30, name: 'Jim'}]
or
[{age: 21, name: 'bob'}, {age: 30, name: 'Jim'}]

Filtering a query after-the-fact is not an ideal solution, as I will still want to select, $limit, and $skip as usual.

Sinetheta
  • 9,189
  • 5
  • 31
  • 52
  • what is your criterion for selecting between `sally` and `bob`, if they both have the same age? – Ryan Artecona Feb 04 '13 at 23:18
  • In fact what is the criteria for selecting two documents that are `sally` in general? Which document would you want to return? – Sammaye Feb 04 '13 at 23:21
  • @RyanArtecona You've predicted my follow-up question (how to specify the filter). For my particular use-case I need the "latest" record, but I didn't think it would be necessary to muddy the waters here until seeing how someone would filter at all. – Sinetheta Feb 04 '13 at 23:22
  • Next question, how you do know the latest record? – Sammaye Feb 04 '13 at 23:23
  • @Sammaye Guys, all of your comments are ancillary to my question. Let's not put the cart before the horse. Show me *any* way, and I'm sure that it will naturally suggest how to do it a *specific* way. – Sinetheta Feb 04 '13 at 23:24
  • 1
    `db.col.aggregate([{$group: {_id: '$age', name: {$max: '$name'}}}])` using the new aggregation framework introduced in v2.1 – Sammaye Feb 04 '13 at 23:25
  • 1
    To answer your edit the aggregation framework is much like SQLs own, so for a limit onto the previous query: `db.col.aggregate([{$group: {_id: '$age', name: {$max: '$name'}}}, {$limit: 10}])` There are so many operators to it take a look: http://docs.mongodb.org/manual/applications/aggregation/ – Sammaye Feb 04 '13 at 23:29
  • Thanks, I suspected it might be within `.aggregate()`, but was frankly a little terrified by the concept of a "pipeline". I'll setup a test to verify the syntax. Maybe cut and paste into an answer so I can give some karma once I confirm it works? – Sinetheta Feb 04 '13 at 23:37
  • I keep getting "TypeError: Cannot read property 'explain' of undefined". Anyone have a demo or example? – Sinetheta Feb 04 '13 at 23:42
  • heh yea...the aggregation framework doesnt yet have an explain: https://jira.mongodb.org/browse/SERVER-4504 it is coming though, which is what that error says. – Sammaye Feb 04 '13 at 23:49
  • @Sammaye Ha, thanks again. It turns out I was accidentally chaining `.toArray()` after the `.aggregate()` since it used to be a `.find()`. Do you mind putting your comment as an answer so we can button this one up for anyone else who finds it? – Sinetheta Feb 04 '13 at 23:58

2 Answers2

4
> db.foo.insert([{age: 21, name: 'bob'}, {age: 21, name: 'sally'}, {age: 30, name: 'Jim'}])
> db.foo.count()
3
> db.foo.aggregate({ $group: { _id: '$age', name: { $max: '$name' } } }).result
[
    {
        "_id" : 30,
        "name" : "Jim"
    },
    {
        "_id" : 21,
        "name" : "sally"
    }
]
A. Jesse Jiryu Davis
  • 23,641
  • 4
  • 57
  • 70
  • 1
    So all of the document needs to be "reconstructed"? What if there had been other properties (eg: height, weight), how do we get the whole record with the `$max: '$name'` and not just pin the max name onto the grouped $age? – Sinetheta Feb 05 '13 at 00:06
  • It's just like a SQL "GROUP BY", then if documents have a field called `other_field`, then you should add to your `$group` an expression saying how you want to group that field. See the `name: { $max: '$name' }` expression in my example. – A. Jesse Jiryu Davis Feb 05 '13 at 00:21
  • 1
    That will give you the info a person who doesn't necessarily exist, `{ "_id: 30", name: "max name from those with id 30", other_field: "max other_field from those with id 30"}`. (before you edited). So what's the command to get the other_field from the record with the $max: '$name'? – Sinetheta Feb 05 '13 at 00:24
  • 2
    Good point - use $first or $last in place of $max, to consistently choose all fields' values from the first- or last-encountered document in each group: http://docs.mongodb.org/manual/reference/aggregation/first/#_S_first – A. Jesse Jiryu Davis Feb 05 '13 at 03:46
  • 2
    Ya, I ended up using a big 'ole pile of $last. I guess I was just sad to have to "recreate" a document instead of "picking" it from the group. Thanks for all your help. – Sinetheta Feb 05 '13 at 03:52
  • Vote for https://jira.mongodb.org/browse/SERVER-4437 and / or https://jira.mongodb.org/browse/SERVER-8065 -- both tickets are kind of vague, but it seems like if either feature request is implemented you'd get the ability to filter out all but the nth document in the group and output that entire document (e.g., "group docs by age and sort by name descending, give me the whole first document in the age group"). – A. Jesse Jiryu Davis Feb 05 '13 at 12:12
2

Hi I am a newbie mongo dev, I have been struggling with the same question and I was finally able to figure out a solution. I am using aggregate pipeline with the following steps:

  1. group: with _id as age
  2. lookup: use _id from step 1. inside the same collection with local field as _id and foreign field as age, save the lookup to a new list ob.
  3. replaceRoot: ie. replace the root of each result so far with the first object from ob list from step 2.

There maybe better solutions out there. If someone finds it please let us know. Hope the solution helps.

DB:

[
  {
    age: 21,
    name: "bob"
  },
  {
    age: 21,
    name: "sally"
  },
  {
    age: 30,
    name: "Jim"
  }
]

Aggregatation:

db.collection.aggregate([
  {
    $group: {
      _id: "$age"
    }
  },
  {
    $lookup: {
      from: "collection",
      localField: "_id",
      foreignField: "age",
      as: "ob"
    }
  },
  {
    $replaceRoot: {
      newRoot: {
        $first: "$ob"
      }
    }
  }
])

Output:

[
  {
    "_id": ObjectId("5a934e000102030405000002"),
    "age": 30,
    "name": "Jim"
  },
  {
    "_id": ObjectId("5a934e000102030405000000"),
    "age": 21,
    "name": "bob"
  }
]

mongoplayground:

Playground solution