-1

I'm trying to extract a price from a title of a post and compare it with another price. I found this regular expression pattern: \d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2}) that fits my needs perfectly.

I tested it out on https://regexr.com/ and it returns a correct match. however when I try to do that in python 3 I get no matches. here's what the code looks like:

def isPriceBetter(CurrentPrice, title):
   r = re.compile(r'\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})')
   newPrice = r.match(title)

   if newPrice == None:
       return False
   newPrice = float(newPrice)
   if newPrice > CurrentPrice:
       return False
   return True

newPrice variable is always None even though I tested the exact same string that the code is testing on https://regexr.com/ and it returns a match. At first I thought my pattern string was probably being escaped incorrectly which is why I put 'r' in front of the quotes. I've also tried adding more '\' characters but that didn't work either.

martineau
  • 119,623
  • 25
  • 170
  • 301
Bridgie
  • 1
  • 4
  • What is an example of a post title you are trying to extract from? – Chrispresso Mar 21 '20 at 19:58
  • "[RAM] G.SKILL Ripjaws V Series 16GB (2 x 8GB) DDR4 3600mhz $69.99" – Bridgie Mar 21 '20 at 19:59
  • Ah ok. Well in that case `re.match` is going from the beginning of the string. So you either need to account for that by doing `.*?\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2}` or by doing `re.findall` on `\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2}` – Chrispresso Mar 21 '20 at 20:03

1 Answers1

0

re.match is going to go from the beginning of the string. Because the string does not start with a digit, nothing is going to match. You can instead use something like this:

Assuming s="[RAM] G.SKILL Ripjaws V Series 16GB (2 x 8GB) DDR4 3600mhz $69.99"

In [1]: regex = re.compile('\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})')                                  

In [2]: regex.findall(s)                                                                          
Out[2]: ['69.99']

or you need to account for whatever is in the beginning of the string and you could create a match group like so:

In [1]: regex = re.compile('.*?(?P<price>\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2}))')                    

In [2]: match = regex.match(s)                                                                    

In [3]: match                                                                                     
Out[3]: <re.Match object; span=(0, 65), match='[RAM] G.SKILL Ripjaws V Series 16GB (2 x 8GB) DDR>

In [4]: match.group('price')                                                                      
Out[4]: '69.99'
Chrispresso
  • 3,660
  • 2
  • 19
  • 31
  • ah I see so unlike in other languages re.match() only scans from the beginning of the string. this was quite confusing. thank you for the explanation, I'll just use findall() – Bridgie Mar 21 '20 at 20:12