2

I use regex to detect numbers from "0" to "999 999 999" inside a string in Python.

import re

test_string = "b\'[<span id=\"prl\">114 893</span>]\'"

working_pattern = "\d{1,3}\s\d{3}"
non_working_pattern = "\d{1,3}(\s\d{3}){0,2}"

wk_ptrn = re.findall(working_pattern, test_string)
non_wk_ptrn = re.findall(non_working_pattern, test_string)

print(wk_ptrn)
print(non_wk_ptrn)

The results are :

print(wk_ptrn) displays : ['114 893']
print(non_wk_ptrn) displays : [' 893'] (with a space before the first digit)

The non_working_pattern is "\d{1,3}(\s\d{3}){0,2}"

\d{1,3} :

detects 1 to 3 digits [0 to 999]

\s\d{3} : 

detects any white space followed by 3 digits [" 000" to " 999"]

{0,2} : 

is a quantifier so I can detect "0" (quantifier = 0) to "999[ 999][ 999]" (quantifier = 2).

I don't understand why "\d{1,3}(\s\d{3}){0,2}" doesn't work .
Can you please help me figure out the mistake ?

Thank you. Regards.

sid8491
  • 6,622
  • 6
  • 38
  • 64
Frankie
  • 181
  • 1
  • 1
  • 13

1 Answers1

0

You are almost there, but you should change it as follows:

pattern = "\d{1,3}(?:\s\d{3}){0,2}"

The ?: makes the group non-capturing so that findall will return the entire matches, and not just the groups. As stated by the linked docs:

If one or more groups are present in the pattern, return a list of groups

user2390182
  • 72,016
  • 6
  • 67
  • 89