-2

So doing this (in python 3.7.3):

>>> from re import findall
>>> s = '7.95 + 10 pieces'
>>> findall(r'(\d*\.)?\d+', s)
['7.', '']   # Expected return: ['7.95', '10']

I'm not sure why it doesn't find all the floats inside? Is this possibly some python quirk about capturing groups?

My logic behind the regex: (\d*\.)? matches either 1 or none of any number of digits, followed by a period. \d+ then maches any number of digits, so this regex should match any of 11.11, 11, .11 and so on. Whats wrong here?

sshashank124
  • 31,495
  • 9
  • 67
  • 76
HHC
  • 79
  • 5

1 Answers1

1

As you guessed correctly, this has to do with capturing groups. According to the documentation for re.findall:

If one or more groups are present in the pattern, return a list of groups

Therefore, you need to make all your groups () non-capturing using the (?:) specifier. If there are no captured groups, it will return the entire match:

>>> pattern = r'(?:\d*\.)?\d+'

>>> findall(pattern, s)
['7.95', '10']
sshashank124
  • 31,495
  • 9
  • 67
  • 76
  • What does it mean to make a group non-capturing? – HHC Jan 15 '20 at 04:22
  • @HHC this is explained very well here: https://stackoverflow.com/questions/6418985/capturing-group-in-regex – sshashank124 Jan 15 '20 at 04:23
  • Hhm, so what does it mean when it returns ```['7.', '']```? How is ```'7.'``` a group? – HHC Jan 15 '20 at 04:28
  • In your original regex, the group was `(\d*\.)`. Which matches `7.` in `7.95` and the empty string in `10`. A group is specified by a set of parenthesis `()` – sshashank124 Jan 15 '20 at 04:29
  • Also, if you're still there, do you think this is a good regex for retrieving simple floats from strings? Or could it be simpler? – HHC Jan 15 '20 at 04:39
  • Well for one, it doesn't handle negative numbers. But as long as it satisfies your need, yes it seems reasonable – sshashank124 Jan 15 '20 at 04:41