2

Using PyMongo 3.10.1, MongoDB 4.2 the aggregation below, using $group with $regexMatch works OK on command line:

db.accounts.aggregate([
        {'$lookup': {'from': 'users', 'localField': '_id', 'foreignField': 'user_id', 'as': 'users'}},
        {'$unwind': "$users"},
        { "$group": {
              "_id": {"_id": "$users.user_id"},
              "users": {"$push": "$users"},
              "total": {"$sum": {"$cond": [{"$regexMatch": {"input": "$users.email", "regex": /filtered/}},1,0]}}
            }
        },
])

But running equivalent aggregation using PyMongo, gives an OperationFailure in $regexMatch:

pipeline = [
    {'$lookup': {'from': 'users', 'localField': '_id', 'foreignField': 'user_id', 'as': 'users'}}, 
    {'$unwind': '$users'}, 
    {'$group': {
        '_id': {'_id': '$users.user_id'},
        'users': {'$push': '$users'}, 
        'total': {'$sum': {'$cond': [{'$regexMatch': {'input': '$users.email', 'regex': re.compile('.*filtered.*', re.IGNORECASE)}}, 1, 0]}}}},
]

Error is:

  File "/Users/gcw/.pyenv/versions/3.7.6/envs/tt-api-env/lib/python3.7/site-packages/pymongo/collection.py", line 2380, in aggregate
    **kwargs)
  File "/Users/gcw/.pyenv/versions/3.7.6/envs/tt-api-env/lib/python3.7/site-packages/pymongo/collection.py", line 2299, in _aggregate
    retryable=not cmd._performs_write)
  File "/Users/gcw/.pyenv/versions/3.7.6/envs/tt-api-env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1464, in _retryable_read
    return func(session, server, sock_info, slave_ok)
  File "/Users/gcw/.pyenv/versions/3.7.6/envs/tt-api-env/lib/python3.7/site-packages/pymongo/aggregation.py", line 148, in get_cursor
    user_fields=self._user_fields)
  File "/Users/gcw/.pyenv/versions/3.7.6/envs/tt-api-env/lib/python3.7/site-packages/pymongo/pool.py", line 613, in command
    user_fields=user_fields)
  File "/Users/gcw/.pyenv/versions/3.7.6/envs/tt-api-env/lib/python3.7/site-packages/pymongo/network.py", line 167, in command
    parse_write_concern_error=parse_write_concern_error)
  File "/Users/gcw/.pyenv/versions/3.7.6/envs/tt-api-env/lib/python3.7/site-packages/pymongo/helpers.py", line 159, in _check_command_response
    raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Failed to optimize pipeline :: caused by :: $regexMatch invalid flag in regex options: u

But from where does this regex option u comes from?

gcw
  • 1,639
  • 1
  • 18
  • 37

2 Answers2

3

By changing the regex definition using options syntax, it works. Below working pipeline in PyMongo.

pipeline = [
    {'$lookup': {'from': 'users', 'localField': '_id', 'foreignField': 'user_id', 'as': 'users'}}, 
    {'$unwind': '$users'}, 
    {'$group': {
        '_id': {'_id': '$users.user_id'},
        'users': {'$push': '$users'}, 
        'total': {'$sum': {'$cond': [{"$regexMatch": {"input": "$users.email", "regex": ".*tiquetaque.*", "options": "i"}}, 1, 0]}}}},
]
gcw
  • 1,639
  • 1
  • 18
  • 37
0

My guess is you are doing something like Working with UTF-8 encoding in Python source to set source encoding to utf-8, which adds the unicode flag to your regular expressions.

https://docs.mongodb.com/manual/reference/operator/aggregation/regexMatch/ supports specifying the regular expression and the options as strings instead of a regular expression object.

D. SM
  • 13,584
  • 3
  • 12
  • 21
  • Yes, I've just found out that using the other definition format for regex, it worked. Must be an issue or limitation in pymongo. – gcw Jul 24 '20 at 14:56
  • 1
    About the utf-8 encoding adding unicode flag. Using `re` object for a `$match` stage like `{'$match': {'promo_code': re.compile('.*some.*')}}` it worked. So it looks like something related to $regexMatch. Also Python version is 3.7.2 – gcw Jul 24 '20 at 15:25
  • 1
    If you feel this is a pymongo issue, you can report it via https://jira.mongodb.org/browse/PYTHON. – D. SM Jul 24 '20 at 16:13