Getting a distinct values array of a big Mongodb collection

Question

I have a collection of ~8M documents. Each document is of the following format, for example:

doc_1: {'_id':..., 'user_name': 'bla1', 'predicted_class': 'class_1}
doc_2: {'_id':..., 'user_name': 'bla2', 'predicted_class': 'class_2}

While the 'user_name' field is a contains unique values and 'predicted_class' doesn't.

I'm trying to get distinct user names of a certain predicted class. In some cases (where a class of predicted_class has a few documents) I'm getting the distinct values but in most it's just loading and loading (disk and memory usage up high) but doesn't end.

Iv'e tried to use the simple

db.getCollection('predictions').find({'predicted_class': 'class_a'}).distinct('user_name')

as well as

but no luck out there.

The issue arises as a result of the collection size and I understand that a different approach must be taken (map reduce perhaps) but my Mongodb understanding is limited unfortunately.

How should I approach this issue?

https://docs.mongodb.com/manual/reference/operator/aggregation/group/#examples — Alex Blex, Apr 24 '19 at 08:26
See also the manual page for [`distinct()`](https://docs.mongodb.com/manual/reference/method/db.collection.distinct/). You don't "chain" the methods, but rather include the query condition **within** the `distinct()` method itself. Various examples on the linked duplicate in other answers other than the accepted one, which lacks a query predicate. Also `$group` with aggregation can be used with a `$match` for similar results. — Neil Lunn, Apr 24 '19 at 23:21

Getting a distinct values array of a big Mongodb collection

0 Answers0