I use python 3.7.1 (default, Dec 14 2018, 19:28:38), and pymongo 3.7.2.
In mongodb this works:
db.collection.find(
{$and:[
{"field":{$regex:"bon?"}},
{"field":{$not:{$regex:"bon souple"}}},
{"field":{$not:{$regex:"bon léger"}}}
]}
)
So in pymongo I did the same as:
db.collection.find(
{"$and":[
{"field":{"$regex":"bon?"}},
{"field":{"$not":{"$regex":"bon souple"}}},
{"field":{"$not":{"$regex":"bon léger"}}}
]}
)
but it indicatespymongo.errors.OperationFailure: $regex has to be a string
.
So I tried this as proposed here:
liste_reg=[
{'field': {'$regex': {'$not': re.compile('bon souple')}}},
{'field': {'$regex': {'$not': re.compile('bon léger')}}},
{'field': {'$regex': re.compile('bon?')}}
]
rslt=list(
db.collection.find({"$and":liste_reg})
)
I noticed that even when there is no special character it indicates the same error:
liste_reg=[
{'field': {'$regex': {'$not': re.compile('bon souple')}}} #where no special char is present
]
rslt=list(
db.collection.find({"$and":liste_reg})
)
So I tried to use "/"
as:
liste_reg=[
{'field': {'$regex': {'$not':'/bon souple/'}}} #where no special char is present
#even tried re.compile('/bon souple/')
]
rslt=list(
db.collection.find({"$and":liste_reg})
)
the same error pymongo.errors.OperationFailure: $regex has to be a string
still occurs.
What can I do?
SOME UPDATE OF MY RESEARCH OF SOLUTION
the core of the issue seems to be with $not
because when I do:
liste_reg=[{'field': {'$regex': 'bon?'}}]
rslt=list(
db.collection.find({"$and":liste_reg})
)
len(rslt)#gives 23 013, what is ok.
There is no error.
SOME SAMPLES
As asked by Emma I can give a sample, and it will explicit my request in mongo. Normally I must have these modalities in the field:
- sec
- très léger
- léger
- bon léger
- bon
- bon souple
- souple
- très souple
- collant
- lourd
- très Lourd
- profond
The main problem for me is my spider did not parse correctly because I did not write a strong enough script for that. Instead of obtaining just "bon", I obtain this kind of result:
{"_id":"ID1",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\tnon",
...}
and that's an example between many others wrong parsing.
So that's why I want result that begins with "bon?"
but not "bon souple"
or "bon léger"
because they have correct values, no \n
or \t
.
So as samples:
[{"_id":"ID1",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\t\t\tnon"},
{"_id":"ID2",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\tpremière"},
{"_id":"ID3",
"field":"bon\r\n\t\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t\r\n\t\t\t\t\t\t2ème"},
{"_id":"ID4",
"field":"bon souple"},
{"_id":"ID5",
"field":"bon léger"}]