0

I need a robust regex that will match all characters until a float.

I have a dict of strings with strings looking like the following mock example:

    'some string 1 some more 2.1 even more 9.2 caracala,domitian2.3'
...

I need a robust regex to substring each string only on the floats, so the end result will look like this:

{
  'some string 1 some more': '2.1'
  'even more': '9.2'
  'caracala,domitian': '2.3'
}

I'll use a for loop with python re to get the end result but I need a robust regex that will match all characters until a float.

I have tried: [-+]?\d*\.\d+|\d+ but it selects numbers as well

S. Schenk
  • 1,960
  • 4
  • 25
  • 46

1 Answers1

2

Using re.findall might get you the result you want:

inp = "some string 1 some more 2.1 even more 9.2 caracala,domitian2.3"
matches = re.findall(r'(.*?)\s*(\d+\.\d+)\s*', inp)
print(matches)

[('some string 1 some more', '2.1'), ('even more', '9.2'), ('caracala,domitian', '2.3')]

Explanation of regex:

(.*?)       match all content up the first
\s*         optional space, which is followed by
(\d+\.\d+)  a floating point number

Note that we capture the leading content and float in separate capture groups, which then appear separately in the resulting list.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360