-2

Hope the question was understandable. What I want to do is to match anything that constitutes a number (int and float) in python syntax. For instance, I want to match everything on the form (including the dot):

123
123.321
123.

My attempted solution was

"\b\d+/.?\d*\b"

...but this fails. The idea is to match any sequence that starts with one or more digit (\d+), followed by an optional dot (/.?), followed by an arbitrary number of digits (\d*), with word boundaries around. This would match all three number forms specified above.

The word boundary is important because I do not want to match the numbers in

foo123
123foo

and want to match the numbers in

a=123.
foo_method(123., 789.1, 10)

However the problem is that the last word boundary is recognised right before the optional dot. This prevents the regex to match 123. and 123.321, but instead matches 123 and 312.

How can I possibly do this with word boundaries out of the question? Possible to make program perceive the dot as word character?

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Snusifer
  • 485
  • 3
  • 17
  • Can not reproduce your issue. `re.findall(r"\b\d+\.?\d*\b", "foo_method(123.321, 789.1, 10)")` gives back `['123.321', '789.1', '10']`. – orlp Oct 22 '19 at 23:38
  • And you are missing quite some things from your integer/float syntax. The following are all valid numbers: `7 2147483647 0o177 0b100110111 3 79228162514 0o377 0xdeadbeef 100_000_000_000 0b_1110_0101 3.14 10. .001 1e100 3.14e-10 0e0 3.14_15_93` See https://docs.python.org/3/reference/lexical_analysis.html#integer-literals. – orlp Oct 22 '19 at 23:45
  • Yes, `r"\b\d+\.?\d*(?!\w)"` – Wiktor Stribiżew Oct 23 '19 at 00:40

1 Answers1

0

The float spec is a little more complicated than you've got covered there.

This matches pythons float spec, though there are others as well.

r"[+-]?\d+\.?\d*([eE][+-]?\d+)?"

You can add on positive lookaheads and lookbehinds to this if you are doing something relatively simple, but you may want to split all of what you are parsing by word boundary before parsing for something more complex

This would be the version ensuring word boundaries:

r"(?<=\b)[+-]?\d+\.?\d*([eE][+-]?\d+)?(?=\b)"
Michael H
  • 77
  • 5