0

I'm trying to find numbers in a string.

import re

text = "42 ttt 1,234 uuu 6,789,001"

finder = re.compile(r'\d{1,3}(,\d{3})*')
print(re.findall(finder, text))

It returns this: ['', ',234', ',745']

What's wrong with regex?

How can I get ['42', '1,234', '6,789,745']?

Note: I'm getting correct result at https://regexr.com

Kolom
  • 217
  • 1
  • 11

2 Answers2

4

You indicate with parentheses (...) what the groups are that should be captured by the regex.

In your case, you only capture the part after (and including) the first comma. Instead, you can capture the whole number by putting a group around everything, and make the parentheses you need for * non-capturing through an initial ?:, like so:

r'(\d{1,3}(?:,\d{3})*)'

This gives the correct result:

>>> print(re.findall(finder, text))
['42', '1,234', '6,789,001']
Dion
  • 1,492
  • 11
  • 14
  • yeah it seems to work, but do you have any idea why my original regex is working at https://regexr.com but not at python? – Kolom Sep 02 '20 at 07:10
  • yes I realised that `,,` your one is correct – terry Sep 02 '20 at 07:12
  • @Kolom [This is `re.findall` specific behavior](https://docs.python.org/3/library/re.html#re.findall): "If one or more groups are present in the pattern, return a list of groups". regexr.com highlights the whole match instead. – Dion Sep 02 '20 at 07:13
  • @Dion Yeah but, why it's not matching the first number though: 42 – Kolom Sep 02 '20 at 07:14
  • It is matching the first number, it just returns you the empty group `(,\d{3})*` (this is why you get an empty string as first result) – Dion Sep 02 '20 at 07:15
0

you just need to change your finder like this.

finder = re.compile(r'\d+\,?\d+,?\d*')
Terry Sun
  • 11
  • 5
  • This matches '123,12,1', though :) – Dion Sep 02 '20 at 07:14
  • yes, your answer is best solution. (?:,\d{3})* Non-capturing group (?:,\d{3})* * Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy) , matches the character , literally (case sensitive) – Terry Sun Sep 02 '20 at 07:23