1

Hi I would like to be able to extract just floats from a string

str = "Test string 1.234 0.155.1 5.67799350,-2.654657

Outcome should be

[1.234, 5.67799350, -2.654657]

I was using [-+]?\d*\.\d+|\d+ but it detect the 0.155.1 which I don't want.

import re
floats = re.findall(r"[-+]?\d*\.\d+|\d+", str)

Thanks for your reading.

user2643679
  • 706
  • 12
  • 18

3 Answers3

3

I believe you found your code here? Either way, maybe a negative lookbehind and lookahead will work for you and create a more solid pattern?

(?<!\.)[-+]?\b\d+\.\d+(?!\.)\b

See the Online Demo


Pattern breakdown:

  • (?<!\.) - Negative lookbehind for a literal dot.
  • [-+]? - Optional plus or minus sing.
  • \b - Word-boundary.
  • \d+\.\d+ - One or more digits, a literal dot and again one or more digits.
  • (?!\.) - Negative lookahead for a literal dot.
  • \b - Word-boundary.

enter image description here


Python sample code:

import re 
str = 'Test string 1.234 0.155.1 5.67799350,-2.654657'
lst = [float(i) for i in re.findall(r'(?<!\.)[-+]?\b\d+\.\d+(?!\.)\b', str)]
print(lst)

Result >>

[1.234, 5.6779935, -2.654657]
JvdV
  • 70,606
  • 8
  • 39
  • 70
2

Use

[-+]?\b(?<!\d\.)\d+\.\d+\b(?!\.\d)

See proof

Alternative to match floats without integer part (.59) and when glued to word characters (_4.567):

[-+]?(?<!\d\.)(?<!\d)\d*\.\d+(?!\.?\d)

See another proof

It matches an optional plus/minus, one or more digit, dot, one or more digits, wrapped with word boundaries and not in between digit-dot and dot-digit.

Python:

import re 
text = 'Test string 1.234 0.155.1 5.67799350,-2.654657'
print([float(i) for i in re.findall(r"[-+]?\b(?<!\d\.)\d+\.\d+\b(?!\.\d)", text)])

Result:

[1.234, 5.6779935, -2.654657]
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
0

try (?<![.\d])[-+]?(?:\d+(?:\.\d*)?|\.\d+)(?![.\d])

demo