2

I'm trying to match numbers in scientific notation (regex from here):

scinot = re.compile('[+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)')
re.findall(scinot, 'x = 1e4')
['1e4']
re.findall(scinot, 'x = c1e4')
['1e4']

I'd like it to match x = 1e4 but not x = c1e4. What should I change?

Update: The answer here has the same problem: it incorrectly matches 'x = c1e4'.

Community
  • 1
  • 1
Bogdan Vasilescu
  • 407
  • 4
  • 22

3 Answers3

5

Add anchor at the end of regex and alternative space or equal sign before the number:

[\s=]+([+-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+))$
Toto
  • 89,455
  • 62
  • 89
  • 125
2

Simply add [^\w]? to exclude all alphanumeric characters that precede your first digit:

 [+\-]?[^\w]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)

Technically, the \w will also exlude numeric characters, but that's fine because the rest of your regex will catch it.

If you want to be truly rigorous, you can replace \w with A-Za-z:

 [+\-]?[^A-Za-z]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)

Another sneaky way is to simply add a space at the beginning of your regex - that will force all your matches to have to begin with whitespace.

Akshat Mahajan
  • 9,543
  • 4
  • 35
  • 44
  • Thanks @Akshat. Your suggestion still matches `c1e4` unfortunately. Also, if I add a space (nice idea!), I won't be able to match `x=1e4` anymore. – Bogdan Vasilescu Jan 16 '17 at 02:38
0

scinot = re.compile('[-+]?[\d]+\.?[\d]*[Ee](?:[-+]?[\d]+)?')

This regex would help you to find all the scientific notation in the text.

By the way, here is the link to the similar question: Extract scientific number from string

Community
  • 1
  • 1
James
  • 1
  • 4