MongoDB, using group by aggregate framework to get unique strings

Question

I'm trying to use the aggregation framework to group a lot of strings together to indentify the unique ones. I must also keep some information about the rest of the fields. This would be analogous to me using the * operator in mysql with a group by statement.

SELECT * 
FROM my_table
GROUP BY field1

I have tried using the aggregation framework, and it works fine just to get unique fields.

db.mycollection.aggregate({ 
    $group : { _id : "$field1"}
    })

What if I want the other fields that went with that. MySQL would only give me the first one that appeared in the group (which I'm fine with). Thats what I thought the $first operator did.

db.mycollection.aggregate({
  $group : { 
       _id : "$field1",
       another_field : {$first : "$field2"}
  }})

This way it groups by field1 but still gives me back the other fields attached to document. When I try this I get:

 exception: aggregation result exceeds maximum document size (16MB)

Which I have a feeling is because it is returning the whole aggregation back as one document. Can I return it as another json array?

thanks in advance

http://stackoverflow.com/questions/15307800/mongodb-single-document-size-limit-is-16mb — Amol M Kulkarni, Apr 18 '13 at 07:54

score 3 · Answer 1 · answered Dec 27 '12 at 01:21

3

You're doing the aggregation correctly, but as the error message indicates, the full result of the aggregate call cannot be larger than 16 MB.

Work-arounds would be to either add a filter to reduce the size of the result or use map-reduce instead and output the result to another collection.

answered Dec 27 '12 at 01:21

JohnnyHK

305,182
66
621
471

be careful you need to sort before taking the first element: http://docs.mongodb.org/manual/reference/aggregation/first/#grp._S_first – Bill'o Oct 03 '13 at 14:01

score 2 · Answer 2 · answered Mar 04 '13 at 23:06

If you unique values of the result does not exceed 2000 you could use group() function like

db.mycollection.group( {key : {field1 : 1, field2 : 1}}, reduce: function(curr, result){}, initial{} })

Last option would be map reduce:

db.mycollection.mapReduce( function() { emit( {field1 :1, field2: 1}, 1); }, function(key, values) { return 1;}, {out: {replace: "unique_field1_field2"}})

and your result would be in "unique_field1_field2" collection

score 1 · Answer 3 · answered Dec 27 '12 at 01:24

1

Another alternative is use the distinct function:

db.mycollection.distinct('field1')

This functions accepts a second argument, a query, where you can filter the documents.

answered Dec 27 '12 at 01:24

Miguel Cartagena

2,576
17
20

This is handy but does not solve the problem of returning all the other fields with the distinct document. – jwillis0720 Dec 27 '12 at 01:46
Sorry, I didn't realize this requirement. As the results are too big, you could use the $skip and $limit pipelines to partition the results. The trade-off here, is that you'll need run the aggregation many times and programmatically merge the results. – Miguel Cartagena Dec 27 '12 at 11:15

MongoDB, using group by aggregate framework to get unique strings

3 Answers3

Linked