2

I use python 3.7.1 (default, Dec 14 2018, 19:28:38), and pymongo 3.7.2.

In mongodb this works:

db.collection.find(
    {$and:[
    {"field":{$regex:"bon?"}},
    {"field":{$not:{$regex:"bon souple"}}},
    {"field":{$not:{$regex:"bon léger"}}}
    ]}
    )

So in pymongo I did the same as:

db.collection.find(
    {"$and":[
    {"field":{"$regex":"bon?"}},
    {"field":{"$not":{"$regex":"bon souple"}}},
    {"field":{"$not":{"$regex":"bon léger"}}}
    ]}
    )

but it indicatespymongo.errors.OperationFailure: $regex has to be a string.

So I tried this as proposed here:

liste_reg=[
{'field': {'$regex': {'$not': re.compile('bon souple')}}}, 
{'field': {'$regex': {'$not': re.compile('bon léger')}}}, 
{'field': {'$regex': re.compile('bon?')}}
]
rslt=list(
    db.collection.find({"$and":liste_reg})
)

I noticed that even when there is no special character it indicates the same error:

liste_reg=[
{'field': {'$regex': {'$not': re.compile('bon souple')}}} #where no special char is present
]
rslt=list(
    db.collection.find({"$and":liste_reg})
)

So I tried to use "/" as:

liste_reg=[
{'field': {'$regex': {'$not':'/bon souple/'}}} #where no special char is present
#even tried re.compile('/bon souple/')
]
rslt=list(
    db.collection.find({"$and":liste_reg})
)

the same error pymongo.errors.OperationFailure: $regex has to be a string still occurs.

What can I do?

SOME UPDATE OF MY RESEARCH OF SOLUTION

the core of the issue seems to be with $not because when I do:

liste_reg=[{'field': {'$regex': 'bon?'}}]
rslt=list(
    db.collection.find({"$and":liste_reg})
)
len(rslt)#gives 23 013, what is ok.

There is no error.

SOME SAMPLES

As asked by Emma I can give a sample, and it will explicit my request in mongo. Normally I must have these modalities in the field:

  • sec
  • très léger
  • léger
  • bon léger
  • bon
  • bon souple
  • souple
  • très souple
  • collant
  • lourd
  • très Lourd
  • profond

The main problem for me is my spider did not parse correctly because I did not write a strong enough script for that. Instead of obtaining just "bon", I obtain this kind of result:

{"_id":"ID1",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\tnon",
...}

and that's an example between many others wrong parsing. So that's why I want result that begins with "bon?" but not "bon souple" or "bon léger" because they have correct values, no \n or \t.

So as samples:

[{"_id":"ID1",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\tnon"},
{"_id":"ID2",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\tpremière"},
{"_id":"ID3",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t2ème"},
{"_id":"ID4",
"field":"bon souple"},
{"_id":"ID5",
"field":"bon léger"}]
AvyWam
  • 890
  • 8
  • 28
  • 1
    @Emma I did an update with a sample of what you asked. Or at least what I think you asked. – AvyWam May 29 '19 at 19:42
  • 1
    @Emma as you said in your DEMO it works in it. But, I am not able to explain you why, when I write this in the mongo shell in robo3t `db.collection.find({"field":{$regex:"bon[^\s].+"}})` the first file which appears is `{ "_id" : "364714",..., "field" : "bon léger"}`. I looked at View document in order to see if it is not an exception like `"bon\t\t\t\t\nléger"`, and actually this is really `"bon léger"`. In my mongo shell it takes the spacebar in consideration. Besides in pymongo I obtain an empty list with `len(list(db.geny_rapp.find({'etat_terrain': {'$regex': "bon[^\s].+"}})))`. – AvyWam May 29 '19 at 20:45
  • 1
    @Emma honestly I have another way to do to answer my problematic, but without regex, that's more complicated and I use the set and operations on set: setA-setB -> the set I want. But as I said it is more complicated and that's not the goal. – AvyWam May 29 '19 at 20:49

3 Answers3

4

I just ran into this same issue.

Try doing this:

liste_reg=[
{'field': {'$not': re.compile('bon souple')}}, 
{'field': {'$not': re.compile('bon léger')}}, 
{'field': {'$regex': re.compile('bon?')}}
]
rslt=list(
    db.collection.find({"$and":liste_reg})
)

I just removed the $regex part of the query.

Background

I tried doing {item["type"]: {"$not": item['name']}} and pymongo returned a $not needs a regex or a document error.

So, I tried: {item["type"]: {"$not": {"$regex": item['name']}}} and pymongo returned a $not cannot have a regex error.

I found this SO https://stackoverflow.com/a/20175230/9069964 and here's what finally worked for me:

item_name = item["name"]
{item["type"]: {"$not": re.compile(item_name)}}

I had to ditch the "$regex" part and give "$not" my regex stuff.

jetilling
  • 56
  • 2
  • That's great! It works, and that's totally in the spirit of my code. Besides it gives the way to use '$not' without avoiding it. – AvyWam May 31 '19 at 17:59
1

Try using a string literal with a negative look ahead. The example below should work as long as you have a carriage return (\r) after 'bon'.

import re
bon = re.compile(r'bon(?=\r)')
db.collection.find({'field': bon})
chuck_sum
  • 113
  • 6
  • `len(list(db.collection.find({'field': {'$regex': re.compile(r'bon(?=\r)')}})))` gives me 19 files. While I expect 22242. I think I will answer my problematic with another way than only regex and use the properties of set objects. – AvyWam May 30 '19 at 12:38
  • Might be easier just to clean up your data. `bon_dirty = 'bon\r\n\t' bon_clean = bon_dirty.strip()` – chuck_sum May 30 '19 at 13:03
  • Well I did a dump of my collection, and now it's clear, that's what I expect. It returns the same number of files than mongo with $not. But it's still mysterious why `re.compile()` does not work for me while it does for [others](https://groups.google.com/forum/#!topic/mongodb-user/FdFJWzmKfds). – AvyWam May 30 '19 at 13:27
0

Here, we might be able to approach solving this problem, maybe without using the $not feature. For instance, if we wish to not have bon souple or bon léger which are bon followed by an space, we could maybe use an expression similar to:

"bon[^\s].+"

DEMO

I'm not so sure about what we wish to extract here, but I was just guessing that maybe we would want to swipe bon values not followed by an space and in between the ".

Also, we would likely want to look into regex query requirements and adjust our expressions to it, if necessary, such as with escaping or using capturing group:

(bon[^\s].+)

or:

"(bon[^\s].+)"

or:

\"(bon[^\s].+)\" 

or:

([\s\S]*?)\"(bon[^\s].+)\"

DEMO

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here


I'm not quite sure if this would be what we might want or if it would be relevant, yet according to this documentation, we can try using:

{ name: { $regex: /([\s\S]*?)\"(bon[^\s].+)\"/, $options: "mi" } }

or:

{ name: { $regex: '([\s\S]*?)\"(bon[^\s].+)\"', $options: "mi" } }

db.collection.find

db.collection.find({"field":{ $regex: /(bon[^\s].+)/, $options: "mi" }})

or:

db.collection.find({"field":{ $regex: /(bon[^\s].+)/, $options: "si" }})

Reference:

PyMongo $in + $regex

Performing regex Queries with pymongo

Emma
  • 27,428
  • 11
  • 44
  • 69
  • 1
    Doing `db.collection.find({"field":{$regex:"\"(bon[^\s].+)\"" }})` or `db.collection.find({"field":{$regex:"([\s\S]*?)\"(bon[^\s].+)\""}})` gives: `Fetched 0 record(s) in 55ms`. Note I entered `"\"(bon[^\s].+)\""` and not `\"(bon[^\s].+)\"`, same for `([\s\S]*?)\"(bon[^\s].+)\"`, because it raises error in mongo shell. – AvyWam May 29 '19 at 21:04
  • 1
    `db.collection.find({"field":{ $regex: /([\s\S]*?)\"(bon[^\s].+)\"/, $options: "mi" }})` no error, but result is 0. – AvyWam May 29 '19 at 21:08