0

I have a lot of simple Strings like this one: "amount: 134.707625 delay: 180" and want to extract those 2 numbers, using the following regular expression: '\d+(\.\d+)?' It matches both numbers, but the extraction with findall leads to ['.707625', ''] While the semantically identical regexp '\d+\.\d+|\d+' leads to the desired output ['134.707625', '180']

why do these 2 regexpes behave differently? Here is my testcode:

import re
pattern = re.compile('\d+(\.\d+)?')
print(pattern.findall("amount: 134.707625 delay: 180"))
print(pattern.match('134.707625'))
print(pattern.match('180'))

pattern2 = re.compile("\d+\.\d+|\d+")
print(pattern2.findall("amount: 134.707625 delay: 180"))
print(pattern2.match('134.707625'))
print(pattern2.match('180'))

and here's the corresponding output:

> python temp.py
['.707625', '']
<_sre.SRE_Match object; span=(0, 10), match='134.707625'>
<_sre.SRE_Match object; span=(0, 3), match='180'>
['134.707625', '180']
<_sre.SRE_Match object; span=(0, 10), match='134.707625'>
<_sre.SRE_Match object; span=(0, 3), match='180'>

Im using Python 3.5.2 from the anaconda distribution and Windows 10

Uzaku
  • 511
  • 5
  • 17
  • The first one contains a *capturing group* and `re.findall` returns their vaues if they are defined in the pattern. See [my answer](http://stackoverflow.com/a/31915134/3832970). Use non-capturing groups in these cases rather than alternation - `r'\d+(?:\.\d+)?'` – Wiktor Stribiżew Apr 12 '17 at 13:32
  • Alright, thank you – Uzaku Apr 12 '17 at 13:44

0 Answers0