I have a collection where the _id is of the form [message_code]-[language_code]
and another where the _id is just [message_code]
. What I'd like to do is find all documents from the first collection where the message_code portion of the _id does not appear in the second collection.
Example:
> db.colA.find({})
{ "_id" : "TRM1-EN" }
{ "_id" : "TRM1-ES" }
{ "_id" : "TRM2-EN" }
{ "_id" : "TRM2-ES" }
> db.colB.find({})
{ "_id" : "TRM1" }
I want a query that will return TRM2-EN and TRM-ES from colA. Of course in my live data, there are thousands of records in each collection.
According to this question which is trying to do something similar, we have to save the results from a query against colB and use it in an $in condition in a query against colA. In my case, I need to strip the -[language_code]
portion before doing this comparison, but I can't find a way to do so.
If all else fails, I'll just create a new field in colA that contains only the message code, but is there a better way do it?
Edit: Based on Michael's answer, I was able to come up with this solution:
var arr = db.colB.distinct("_id")
var regexs = arr.map(function(elm){
return new RegExp(elm);
})
var result = db.colA.find({_id : {$nin : regexs}}, {_id : true})
Edit: Upon closer inspection, the above method doesn't work after all. In the end, I just had to add the new field.