11

How can I combine $regex with $in in PyMongo?

I want to search for either /*.heavy.*/ or /*.metal.*/.

I tried in python without success:

db.col.find({'music_description' : { '$in' : [ {'$regex':'/*.heavy.*/'} ]} })

The equivalent in Mongo shell is:

db.inventory.find( { music_description: { $in: [ /heavy/, /metal/ ] } } )
Mateusz Piotrowski
  • 8,029
  • 10
  • 53
  • 79
Diolor
  • 13,181
  • 30
  • 111
  • 179

2 Answers2

20

Use python regular expressions.

import re
db.col.find({'music_description': {'$in': [ re.compile('.*heavy.*'), re.compile('.*metal.*')]}})
user2998367
  • 201
  • 1
  • 4
  • 1
    Huh. Works like a charm. Somehow, as fitting as it is, it's also not exactly obvious -- a regex string of some sort seemed the more obvious choice (I suppose because of the conversion from string to a compiled regex and then back to a string for bson). Thank you for your post. – Mr. B Sep 15 '14 at 20:43
6

Why even bother using an $in? You're wasting processing by evaluating the field for each value within the list, and since each value is a regex it has its own performance considerations, Depending on how long your query strings get, it might be prudent just to wrap them up in one regex and avoid the $in query all together

import re
db.col.find({'music_description': re.compile('heavy|metal')})

similarly in mongo shell

db.inventory.find({music_description: /heavy|metal/})

as for [user2998367]'s answer, you're wasting efficiency compiling a regex with greedy wildcards for the sole purpose of a match, the difference between re.search and re.match in python requires the use of the wildcards for re.search purposes, but re.match behaves as 'anywhere in string', as does MongoDB, its only really needed if you're intending to extract, which you'd need to do later after querying anyhow, or if you're reusing a compiled regex somewhere else that you specifically need re.search over re.match

apocolipse
  • 619
  • 6
  • 11
  • he is trying to check array elements, not "music_description" itself. – JSBach Nov 12 '15 at 19:53
  • The `$in` query is a rather versitile one with many uses. The first is the obvious use case that it reads as, "Some left hand value is in this array of possible right hand values", which is what he appears to be querying here (the name `music_description` in no way indicates it is an array of values...) The other use is what you're thinking of, the "this Array left hand value contains one or any of these right hand value(s)", which can also be written as `db.s.find({'someArray': {$in: oneValue}})` instead of the array, finally, also as `db.s.find({'someArray': oneValue})` for brevity – apocolipse Nov 14 '15 at 19:06
  • ...continued And for the purpose of this query, since the right hand values are regex's it makes little to no sense to process each regex individually, and rather just match them as one – apocolipse Nov 14 '15 at 19:08