0

In the string

x='(var1 * 1.3e4 + abc)/log(blabla+2E3)'

I would like to substitute var1, abc, and blabla with '1', say to pass into ast and see if this is a proper expression. I don't want to touch log or e or E. Of course there are other things I may want to skip, like sin.

Currently I'm using something like

for match in re.findall(r'[a-zA-Z]+',x):
    if match.startswith('log') or match.lower()=='e': continue
    x = x.replace(string,'1')

The log can come in a few flavors, hence startswith - obviously won't work for any case. I would prefer to use re.sub in one go.

kabanus
  • 24,623
  • 6
  • 41
  • 74
  • Do yourself a favor and get a proper expression parser: https://stackoverflow.com/questions/2371436/evaluating-a-mathematical-expression-in-a-string – Pavel Dec 19 '17 at 17:54
  • @Pavel If I understand that correctly, that refers to 'numerical strings', as in, no 'abc' in the string. This is not optional, and the substitution is not just '1' in my actual use case, but real user variables. I do not won't to substitute anything in until I'm sure the expression is valid and all variables are recognized from a given set. – kabanus Dec 19 '17 at 17:58

1 Answers1

2

Code

See regex in use here

\b(?!(?:[+-]?\d*\.?\d+(?:e[+-]?\d+)?|log|sin|cos)\b)\w+\b

Usage

Create an array of exceptions (as shown below) and join the list on |. Also, note that re.escape isn't always necessary, but I figured I'd show it to demonstrate how you would create this joined list with normal strings and regular expressions (in case that's what you need to do).

See code in use here

import re

exceptions = [
    re.escape("log"),
    re.escape("sin"),
    re.escape("cos"),
    r"[+-]?\d*\.?\d+(?:e[+-]?\d+)?"
]

s = "(var1 * 1.3e4 + abc)/log(blabla+2E3)*1.2E+23"
r = r"\b(?!(?:" + "|".join(exceptions) + r")\b)\w+\b"

print re.sub(r, "1", s, 0, re.I)

Explanation

  • \b Assert position as a word boundary
  • (?!(?:stuff here)\b) Negative lookahead ensuring what follows doesn't match
    • (?:stuff here) This contains the joined list of exceptions such as log, sin, cos, or numbers ([+-]?\d*\.?\d+(?:e[+-]?\d+)?), etc.
  • \w+ Match one or more word characters
  • \b Assert position as a word boundary
ctwheels
  • 21,901
  • 9
  • 42
  • 77
  • Very impressive, and thoroughly explained. Easy to scale. Note though `e` may be a capital, and maybe followed by a sign. Better use `e[^\w]` in the exception list. Also, no need for a `+` there. – kabanus Dec 19 '17 at 18:09
  • @kabanus are `log` and `sin` case insensitive as well? Basically, is everything case insensitive? – ctwheels Dec 19 '17 at 18:11
  • Might as well make them if you're at it. That would be most flexible - probably just a `re.IGNORECASE`. – kabanus Dec 19 '17 at 18:12
  • I already said I think the answer is very good, and you made it better. The only thing I have to add is that the exception`r"\d+(?:e\d+)?"` will fail on `1.2E+23`, which is valid. You can just add a `[+-]?` before the `\d`, and remove unneeded `+` at the end (one is enough for check). – kabanus Dec 19 '17 at 18:25
  • @kabanus please see new update. I changed the number portion of the regex. – ctwheels Dec 19 '17 at 18:31
  • Are you sure `\de[+-]?\d` isn't enough to verify a valid exponential? You don't really care about catching **all** of it, just skip the `e`. – kabanus Dec 19 '17 at 18:32
  • @kabanus *technically* yes. But if you want to ensure the full number format is captured as a whole (i.e. `1.2E+23` instead of `1` and `2E+23) – ctwheels Dec 19 '17 at 18:34
  • @kabanus actually your regex doesn't capture numbers without `e` in them. I would suggest using the new regex I posted above as it works with decimal places and with `e` – ctwheels Dec 19 '17 at 18:40