165

Is there a query for calculating how many distinct values a field contains in DB.

f.e I have a field for country and there are 8 types of country values (spain, england, france, etc...)

If someone adds more documents with a new country I would like the query to return 9.

Is there easier way then group and count?

chridam
  • 100,957
  • 23
  • 236
  • 235
Liatz
  • 4,997
  • 7
  • 28
  • 33
  • 2
    Have you looked at the [aggregation](http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/) framework? – WiredPrairie Feb 17 '13 at 19:44
  • 1
    Or [map-reduce](http://docs.mongodb.org/manual/applications/map-reduce/)? – WiredPrairie Feb 17 '13 at 19:45
  • Possible duplicate of [MongoDB select count(distinct x) on an indexed column - count unique results for large data sets](http://stackoverflow.com/questions/11782566/mongodb-select-countdistinct-x-on-an-indexed-column-count-unique-results-for). I posted my answer there. – expert Nov 01 '15 at 20:01

10 Answers10

282

MongoDB has a distinct command which returns an array of distinct values for a field; you can check the length of the array for a count.

There is a shell db.collection.distinct() helper as well:

> db.countries.distinct('country');
[ "Spain", "England", "France", "Australia" ]

> db.countries.distinct('country').length
4

As noted in the MongoDB documentation:

Results must not be larger than the maximum BSON size (16MB). If your results exceed the maximum BSON size, use the aggregation pipeline to retrieve distinct values using the $group operator, as described in Retrieve Distinct Values with the Aggregation Pipeline.

Stennie
  • 63,885
  • 14
  • 149
  • 175
  • 66
    this doesn't really work if your number of distinct values is too high... if you were looking at distinct names of people in the world or something. do you have an answer that scales? – underrun Oct 01 '14 at 18:36
  • 6
    1+ for length. i was struggling to find something like that. Thanks. – Adeel Ahmad Mar 30 '15 at 13:41
  • 2
    I don´t know why they don´t use count() there as well – Marian Klühspies Nov 26 '15 at 23:24
  • 2
    @MarianKlühspies - because it's just a javascript array, which uses the length property to count the number of elements. – UpTheCreek May 30 '18 at 09:47
  • Just what I was looking for... TY – Maulzey Jun 02 '20 at 03:37
  • @UpTheCreek. so what if javascript..as end user i am least bothered if underlying is python or javascript etc.. what's the problem to keep it as count.just a syntactic sugar right? Anyways , the developer, manager or company of the product decides what to name :) – Stunner Jul 28 '21 at 15:00
  • 1
    This doesn't work when you have millions or billions of records..... – rohitcoder Aug 26 '21 at 12:12
  • 1
    @rohitcoder If you have millions or billions of documents to iterate for distinct values, I suggest posting a new question with details of your environment and use case. I expect you will be using a more modern version of MongoDB than this discussion from Feb 2013, when MongoDB 2.2 was the latest server release and the Aggregation Framework had just been introduced :). I edited my answer to include a note from the `distinct` documentation which recommends the Aggregation Framework for larger result sets, but there are other approaches and operators available in modern versions of MongoDB. – Stennie Sep 07 '21 at 07:42
163

Here is example of using aggregation API. To complicate the case we're grouping by case-insensitive words from array property of the document.

db.articles.aggregate([
    {
        $match: {
            keywords: { $not: {$size: 0} }
        }
    },
    { $unwind: "$keywords" },
    {
        $group: {
            _id: {$toLower: '$keywords'},
            count: { $sum: 1 }
        }
    },
    {
        $match: {
            count: { $gte: 2 }
        }
    },
    { $sort : { count : -1} },
    { $limit : 100 }
]);

that give result such as

{ "_id" : "inflammation", "count" : 765 }
{ "_id" : "obesity", "count" : 641 }
{ "_id" : "epidemiology", "count" : 617 }
{ "_id" : "cancer", "count" : 604 }
{ "_id" : "breast cancer", "count" : 596 }
{ "_id" : "apoptosis", "count" : 570 }
{ "_id" : "children", "count" : 487 }
{ "_id" : "depression", "count" : 474 }
{ "_id" : "hiv", "count" : 468 }
{ "_id" : "prognosis", "count" : 428 }
expert
  • 29,290
  • 30
  • 110
  • 214
  • 4
    Logged in just to + this answer. Thanks! btw if you are doing it on a unique field, just remove the unwind line. – Richie Rich Jul 08 '16 at 17:58
  • @RichieRich, `unwind` is necessary because the code is grouping individual values of an array field which matches how `distinct` works. – Paul Feb 06 '17 at 15:14
  • @Paul what Richie said is that if the grouping is done just "regular" field (string, int etc.) then you don't need the unwind step. Isn't it correct? – Guy Oct 19 '17 at 10:00
  • 1
    @guyarad `unwind` is necessary when working with arrays. – Paul Oct 19 '17 at 13:59
  • +1 for the answer, exactly the thing I was working on, however distinct has its own charms but this is just gold :) -- anyhow I have to read more about aggregates to achieve desired set of results to filter data – Talha May 08 '18 at 14:20
31

With MongoDb 3.4.4 and newer, you can leverage the use of $arrayToObject operator and a $replaceRoot pipeline to get the counts.

For example, suppose you have a collection of users with different roles and you would like to calculate the distinct counts of the roles. You would need to run the following aggregate pipeline:

db.users.aggregate([
    { "$group": {
        "_id": { "$toLower": "$role" },
        "count": { "$sum": 1 }
    } },
    { "$group": {
        "_id": null,
        "counts": {
            "$push": { "k": "$_id", "v": "$count" }
        }
    } },
    { "$replaceRoot": {
        "newRoot": { "$arrayToObject": "$counts" }
    } }    
])

Example Output

{
    "user" : 67,
    "superuser" : 5,
    "admin" : 4,
    "moderator" : 12
}
chridam
  • 100,957
  • 23
  • 236
  • 235
17

I wanted a more concise answer and I came up with the following using the documentation at aggregates and group

db.countries.aggregate([{"$group": {"_id": "$country", "count":{"$sum": 1}}}])
l33tHax0r
  • 1,384
  • 1
  • 15
  • 31
10

You can leverage on Mongo Shell Extensions. It's a single .js import that you can append to your $HOME/.mongorc.js, or programmatically, if you're coding in Node.js/io.js too.

Sample

For each distinct value of field counts the occurrences in documents optionally filtered by query

> db.users.distinctAndCount('name', {name: /^a/i})

{
  "Abagail": 1,
  "Abbey": 3,
  "Abbie": 1,
  ...
}

The field parameter could be an array of fields

> db.users.distinctAndCount(['name','job'], {name: /^a/i})

{
  "Austin,Educator" : 1,
  "Aurelia,Educator" : 1,
  "Augustine,Carpenter" : 1,
  ...
}
evandrix
  • 6,041
  • 4
  • 27
  • 38
9

To find distinct in field_1 in collection but we want some WHERE condition too than we can do like following :

db.your_collection_name.distinct('field_1', {WHERE condition here and it should return a document})

So, find number distinct names from a collection where age > 25 will be like :

db.your_collection_name.distinct('names', {'age': {"$gt": 25}})

Hope it helps!

Polish
  • 554
  • 1
  • 4
  • 18
9

I use this query:

var collection = "countries"; var field = "country"; 
db[collection].distinct(field).forEach(function(value){print(field + ", " + value + ": " + db[collection].count({[field]: value}))})

Output:

countries, England: 3536
countries, France: 238
countries, Australia: 1044
countries, Spain: 16

This query first distinct all the values, and then count for each one of them the number of occurrences.

Rea Haas
  • 2,018
  • 1
  • 16
  • 18
  • Can you please tell me how to write this same query in php laravel? – P N Jain Jun 08 '21 at 05:03
  • 1
    what is host here in this query ? – Heena Jul 09 '21 at 13:15
  • 1
    @HeenaPatil Good catch! I had two bugs in the query, I fixed it now. The `hosts` was the name of my collection in my db... sorry for that. The other issue that I also fixed tried to call `db.collection` which I fixed to `db[collection]`. Thanks! – Rea Haas Aug 19 '21 at 13:34
  • that is what I am looking for, but I was hoping there is a one line query to handle it, Thanks anyway. – user3595231 Jan 05 '23 at 23:34
3

If you're on MongoDB 3.4+, you can use $count in an aggregation pipeline:

db.users.aggregate([
  { $group: { _id: '$country' } },
  { $count: 'countOfUniqueCountries' }
]);
djanowski
  • 5,610
  • 1
  • 27
  • 17
0

Case 1 If you are using find() then the solution for that is:

const getDistinctCountryCount = async (req, res) => {
  const countryCount = await(await db.model.distinct('country')).length;
  console.log("countryCount: ", countryCount);
}

You have to first await the query result, then await to get the length of it.

Case 2 If you are using Aggregate then the solution for that is:

db.model.aggregate([
  { $group: { _id: '$country' } },
  { $count: 'countOfDistinchCountries' }
])
0

Shortest query I've found to get a sorted count of all distincts:

db.myCollection.aggregate([{
    "$group": {"_id": "$myField", "count":{"$sum": 1}}
  },
  { "$sort": {"count": -1}}
])
pmont
  • 2,083
  • 22
  • 43