0

I feel like my question is going to have a really simple answer but here it goes.

I have a MongoDB structure like

{  
...  
a: [array elements],  
b: [array elements],  
c: [array elements],  
...  
}  

I'm using Mongoose and currently I have 3 find distinct calls going out.

Model.find().distinct('a', ...)  
Model.find().distinct('b', ...)  
Model.find().distinct('c', ...)

This does give me what I want, but I would like to get the information in one call. I thought I might be able to use aggregate to do this and get a new Object that looks like

{  
a: [distinct array elements],  
b: [distinct array elements],  
c: [distinct array elements]  
}

What would be the fastest way to get this from MongoDB in one shot?

John
  • 31
  • 4

1 Answers1

0

It is possible to get all 3 in one query using an aggregation pipeline like:

[ 
  {$facet:{
      a: [{$group:{_id:null, values:{$addToSet:"$a"}}}],
      b: [{$group:{_id:null, values:{$addToSet:"$b"}}}],
      c: [{$group:{_id:null, values:{$addToSet:"$c"}}}],
  }}
]

However, this will require reading every document from disk, iterating that entire list 3 times, and building the sets in memory in the same aggregation stage. In a large data set, this might exceed the memory limit for a single stage.

If the collection has an index on {a:1}, the query Model.find().distinct('a') can use a "DISTINCT_SCAN" stage, meaning it will only consult the index to obtain the values, and not read any of the documents directly.

If all 3 of the fields are indexed, 3 separate distinct queries will be the fastest way.

Joe
  • 25,000
  • 3
  • 22
  • 44
  • It makes a lot of sense to limit how many documents are being read. I'm really new to databases so thank you for introducing me to indexes. I'm also trying to limit my API and database calls (aka: network traffic), so is there any way to make an aggregation pipeline use indexes as well or am I just not understanding how the aggregation pipeline works. – John Apr 07 '21 at 13:41
  • One more question. If I set an index to an Array type in Mongoose, will it Index the values in the array or the whole array itself? I'd like only the values to be indexed. – John Apr 07 '21 at 13:59
  • Separate questions should be asked separately. Please submit ask a new question if this doesn't adequately answer: https://stackoverflow.com/questions/7396219/mongodb-multikey-indexing-structure – Joe Apr 07 '21 at 15:18