Regex to match scientific notation

Question

I'm trying to match numbers in scientific notation (regex from here):

scinot = re.compile('[+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)')
re.findall(scinot, 'x = 1e4')
['1e4']
re.findall(scinot, 'x = c1e4')
['1e4']

I'd like it to match x = 1e4 but not x = c1e4. What should I change?

Update: The answer here has the same problem: it incorrectly matches 'x = c1e4'.

@Blender trying to match numbers in scientific notation, but not match variable names containing that pattern. — Bogdan Vasilescu, Jan 16 '17 at 13:55
Possible duplicate of [Parsing scientific notation sensibly?](http://stackoverflow.com/questions/638565/parsing-scientific-notation-sensibly) — hek2mgl, Jan 16 '17 at 15:53
@hek2mgl Thanks for the unnecessary downvote. That's the post I started from, linked where I say "regex from here". — Bogdan Vasilescu, Jan 16 '17 at 16:43

Toto · Accepted Answer · 2017-01-16T18:21:38.257

5

Add anchor at the end of regex and alternative space or equal sign before the number:

[\s=]+([+-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+))$

edited Jan 16 '17 at 18:21

answered Jan 16 '17 at 10:23

Toto

Thanks @Toto. That doesn't match `x = 1e6`. – Bogdan Vasilescu Jan 16 '17 at 13:56
@BogdanVasilescu: Do you mean it doesn't match the whole string `x = 1e4` or `1e4` alone? – Toto Jan 16 '17 at 14:49
1

I meant it doesn't match `1e6` from `x = 1e6`. Sorry about the confusion. `re.findall('^[+-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)$', 'x = 1e6')` returns `[]`. – Bogdan Vasilescu Jan 16 '17 at 15:29
@BogdanVasilescu: Remove the caret (start of string) and add `[\s=]`, see my edit. – Toto Jan 16 '17 at 15:49
I'd prefer if `=` is not part of the match; right now I get `['=1e6']` for `x=1e6`. – Bogdan Vasilescu Jan 16 '17 at 16:46
@BogdanVasilescu: Add `+` after character class `[\s=]+` and use a capture group, see my edit, the result will be in group 1. – Toto Jan 16 '17 at 18:21
what to change here, so that it also matches normal number like '-200.200' ? – Eular Jul 23 '19 at 07:21
@Eular: Just make the last part `(?:[eE][+\-]?\d+)` optional: `(?:[eE][+\-]?\d+)?` (or remove it) – Toto Jul 23 '19 at 07:41

score 2 · Answer 2 · answered Jan 16 '17 at 02:31

Simply add [^\w]? to exclude all alphanumeric characters that precede your first digit:

 [+\-]?[^\w]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)

Technically, the \w will also exlude numeric characters, but that's fine because the rest of your regex will catch it.

If you want to be truly rigorous, you can replace \w with A-Za-z:

 [+\-]?[^A-Za-z]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)

Another sneaky way is to simply add a space at the beginning of your regex - that will force all your matches to have to begin with whitespace.

Thanks @Akshat. Your suggestion still matches `c1e4` unfortunately. Also, if I add a space (nice idea!), I won't be able to match `x=1e4` anymore. — Bogdan Vasilescu, Jan 16 '17 at 02:38

score 0 · Answer 3 · edited May 23 '17 at 12:02

0

scinot = re.compile('[-+]?[\d]+\.?[\d]*[Ee](?:[-+]?[\d]+)?')

This regex would help you to find all the scientific notation in the text.

By the way, here is the link to the similar question: Extract scientific number from string

edited May 23 '17 at 12:02

Community

answered Jan 16 '17 at 02:56

James

Not really: `re.findall('([-+])?(\d+)(\.\d*)?[eE]([-+]?\d+)*', 'x = c1e6')` returns `[('', '1', '', '6')]` – Bogdan Vasilescu Jan 16 '17 at 04:01
Try now @BogdanVasilescu – James Jan 16 '17 at 04:21
still not what I need. `re.findall('[-+]?[0-9]+\.?[0-9]*[Ee](?:\ *[-+]?\ *[0-9]+)?', 'x = c1e6')` incorrectly matches `['1e6']` – Bogdan Vasilescu Jan 16 '17 at 04:32

3 Answers3