I have a collection of ~8M documents. Each document is of the following format, for example:
doc_1: {'_id':..., 'user_name': 'bla1', 'predicted_class': 'class_1}
doc_2: {'_id':..., 'user_name': 'bla2', 'predicted_class': 'class_2}
While the 'user_name' field is a contains unique values and 'predicted_class' doesn't.
I'm trying to get distinct user names of a certain predicted class. In some cases (where a class of predicted_class has a few documents) I'm getting the distinct values but in most it's just loading and loading (disk and memory usage up high) but doesn't end.
Iv'e tried to use the simple
db.getCollection('predictions').find({'predicted_class': 'class_a'}).distinct('user_name')
as well as
but no luck out there.
The issue arises as a result of the collection size and I understand that a different approach must be taken (map reduce perhaps) but my Mongodb understanding is limited unfortunately.
How should I approach this issue?