0

I have a weird case where this simple code is not functioning as expected:

import re

text = 'This Level: 75.3'
matches = re.search(r'(?:(?:\d{1,3},)(?:\d{3},)*(?:\d{3})|\d*)(?:\.\d+)?', text)

print(matches.match)

I keep getting a blank string returned... however, I would expect this to be 75.3. This works for other use cases, such as:

assert util.strip_str_to_float('7') == 7.0
assert util.strip_str_to_float('75') == 75.0
assert util.strip_str_to_float('75.5') == 75.5
assert util.strip_str_to_float('7.7.9') == 7.7
assert util.strip_str_to_float('1,298.3 Gold') == 1298.3

Ultimately, I'm trying to pull out and convert the first float from a given string... I wasn't expecting this test case to be a failure. It seems to be failing specifically when the matching does not start at the beginning of the string. The search seems to work fine if I remove the non-capturing groups, for example, this works:

matches = re.search(r'\d*\.\d+', text)

But this does not:

matches = re.search(r'\d*(?:\.\d+)?', text)

Any ideas...?

  • 1
    https://stackoverflow.com/questions/4703390/how-to-extract-a-floating-number-from-a-string has a solution to pretty much the exact same issue – Bob th Jul 11 '22 at 19:00
  • @Bobth it doesn't seem to - those patterns also don't work (and don't exactly do what mine does, enforcing 3-digits per group). It still doesn't explain why adding non-capturing groups breaks things here... – Andrew Vaughan Jul 11 '22 at 19:58

1 Answers1

0

It looks like you're allowing plain integers without the decimal part as well as decimals like ".5" without the whole number part. That's great, but since both parts are optional, you're also matching when neither part is present, so you're getting a lot of empty 0-length matches.

This is also why your pattern r'\d*\.\d+' worked, because the decimal was required.

pattern = r'\d{1,3}(?:,\d{3})*(?:\.\d+)?|\.\d+'

If I'm understanding the question right, this pattern should work. It's divided into two parts, so it looks for either:

  • a whole number with a decimal part optional, or
  • a required decimal part, with no whole number before it
semmra7
  • 71
  • 4
  • 1
    That did the trick, thanks, and it's much cleaner. I got too in the weeds iterating on it and ended up in group spaghetti. – Andrew Vaughan Jul 12 '22 at 20:28