8

I have researched this but can not find why what I am trying is not working, and will warn that I am somewhat new to python and very new to mongodb. I have a mongo database of tweets in JSON which I am trying to query through Python and pymongo. I want returned the 'text' and 'created_at' fields for all tweets that contain "IP".

I have tried the following, which works perfectly when I do this through the terminal:

db.tweets.find({text:/IP/},{text:1,created_at:1})

In Python, after experimenting I have found that I need to put the field names between quotes. I have gotten the following similar query to work:

cursor = db.tweets.find({'created_at':"Thu Apr 28 09:55:57 +0000 2016"},{'text':1,'created_at':1})

But when I try:

db.tweets.find({"text": /.*IP.*/},{'text':1,'created_at':1})

or

cursor = db.tweets.find({'text':/IP/},{'text':1,'created_at':1})

I get a

'SyntaxError: invalid syntax' at the "/IP/" part of the code.

I am using mongo 3.4.6 and python 3.5.2

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
wderetailers
  • 103
  • 1
  • 1
  • 4

1 Answers1

21

Python does not have special syntax for regexes like JavaScript has.

Using re

You need to compile the regex with the re module:

import re

rgx = re.compile('.*IP.*', re.IGNORECASE)  # compile the regex

cursor = db.tweets.find({'text':rgx},{'text':1,'created_at':1})

You can use re.IGNORECASE as flag if you want to match iP, Ip and ip as well. If you do not want that, you can drop the re.IGNORECASE part.

Using '$regex' notation

Or you can specify that you are working with a regex with:

cursor = db.tweets.find({'text':{'$regex':'IP'}},{'text':1,'created_at':1})
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555