88

I'm working on a query to find cities with most zips for each state:

db.zips.distinct("state", db.zips.aggregate([ 
    { $group:
      { _id: {
           state: "$state", 
           city: "$city" 
         },
        numberOfzipcodes: { 
           $sum: 1
         }
      }
    }, 
    { $sort: {
        numberOfzipcodes: -1
         }
      }
  ])
)

The aggregate part of the query seems to work fine, but when I add the distinct I get an empty result.

Is this because I have state in the id? Can I do something like distinct("_id.state ?

kartik tyagi
  • 6,256
  • 2
  • 14
  • 31
LemonMan
  • 2,963
  • 8
  • 24
  • 34
  • 4
    For those looking for how to use Mongo's aggregation to get distinct values, try this (inspired by [dam1's answer](http://stackoverflow.com/a/35187100/3391108) and [Mongo's documentation](https://docs.mongodb.com/manual/reference/operator/aggregation/addToSet/#example)): `db.collectionName.aggregate([{$group: {_id: null, uniqueValues: {$addToSet: "$fieldName"}}}])` – tscizzle Feb 08 '17 at 17:35

4 Answers4

162

You can use $addToSet with the aggregation framework to count distinct objects.

For example:

db.collectionName.aggregate([{
    $group: {_id: null, uniqueValues: {$addToSet: "$fieldName"}}
}])

Or extended to get your unique values into a proper list rather than a sub-document inside a null _id record:

db.collectionName.aggregate([
    { $group: {_id: null, myFieldName: {$addToSet: "$myFieldName"}}},
    { $unwind: "$myFieldName" },
    { $project: { _id: 0 }},
])
Andy Lorenz
  • 2,905
  • 1
  • 29
  • 29
dam1
  • 3,536
  • 3
  • 22
  • 18
  • 3
    Not a generic solution, if you have a large number of unique zip codes per result, this array would be very large. The question was to get the city with MOST zip codes for each state, not to get the actual zip codes. What happens if you have 10,000,000 zip codes for a given city? – Moshe Gottlieb Sep 06 '16 at 07:50
  • 1
    Just saw this, it would not count distinct objects at all, instead it will place, distinctly, objects into an array, not only that but distinction would be on `===` which is not always a good idea. Instead, you would want to group on distinct values counting the amount of times that value exists, at which point you could easily add a stage to sum it up as the number of unique objects. This, as a method of counting, would be highly inefficient in terms of memory and resource and processor. – Sammaye Mar 27 '18 at 19:50
  • 1
    this might be a much better answer if it provided an actual answer to the OP, and not just a generic example of the syntax that group/addtoSet uses.. – Andy Lorenz Apr 12 '18 at 17:19
  • seems pymongo does not support it.. – Han.Oliver Dec 13 '21 at 18:19
66

Distinct and the aggregation framework are not inter-operable.

Instead you just want:

db.zips.aggregate([ 
    {$group:{_id:{city:'$city', state:'$state'}, numberOfzipcodes:{$sum:1}}}, 
    {$sort:{numberOfzipcodes:-1}},
    {$group:{_id:'$_id.state', city:{$first:'$_id.city'}, 
              numberOfzipcode:{$first:'$numberOfzipcodes'}}}
]);
Sammaye
  • 43,242
  • 7
  • 104
  • 146
  • 5
    @alex23 Distinct is a completely different command which returns an array of distinct values. It is completely in-compatible with the aggregation framework – Sammaye May 03 '13 at 23:01
  • this is a query i had before, but i need to get the distinct cities for each state not the states – LemonMan May 03 '13 at 23:01
  • @Lemonio Added city back in, that should give off the zips in each city in each state now – Sammaye May 03 '13 at 23:02
  • @ sammaye i think this is exactly what i had before though? i wanted only the city with most zips for each state which is why i tried to do distinct. this way i get all the cities in each state, which is the query i had before – LemonMan May 03 '13 at 23:05
  • @Lemonio ok you will need to do a second group, edited, edit: fixed some other errors edit: and more – Sammaye May 03 '13 at 23:09
  • thank you so much! currently it gives null for the city but i'll try to figure that out on my own! – LemonMan May 03 '13 at 23:12
35

SQL Query: (group by & count of distinct)

select city,count(distinct(emailId)) from TransactionDetails group by city;

Equivalent mongo query would look like this:

db.TransactionDetails.aggregate([ 
{$group:{_id:{"CITY" : "$cityName"},uniqueCount: {$addToSet: "$emailId"}}},
{$project:{"CITY":1,uniqueCustomerCount:{$size:"$uniqueCount"}} } 
]);
goto
  • 7,908
  • 10
  • 48
  • 58
4

You can call $setUnion on a single array, which also filters dupes:

{ $project: {Package: 1, deps: {'$setUnion': '$deps.Package'}}}
Jeroen Ooms
  • 31,998
  • 35
  • 134
  • 207