-2

I try to find arithmetic expressions in text strings.

Possible arithmetic expression:

1/3 + 1/4
cos(30) + 25*3,75
sqrt(5) + sin(45)
5 != 6
2**4 + 100.000 =
(2^3)^4
sqrt((0,25*8)/2)
3e4 - 500

I created a regex and used the | symbol between the blocks.

pattern = '((\s*(sqrt|a?sin|a?cos|a?tan|abs|log|log10|exp)?\s*)* | (\s*[e0-9,.()\-]+\s*)* | (\spi\s*)* | (\s*[-+*/%^<>!=]*\s*)*)(\s*\=?\s*)?'

What I really want is that all blocks can be used interchangeably in any order.

How can I do that? It doesn't work using | symbol.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Reman
  • 7,931
  • 11
  • 55
  • 97
  • What did I do wrong Wiktor? – Reman Nov 06 '19 at 13:34
  • 1
    IMO I don't think this is a simple question - the `|` alone will only return one of the sub patterns so detecting complex expressions like these is not realistically feasible with just one regex (barring that it's possible, you'd need some monster length regex, and at that point you might consider if performance will become an issue.) – r.ook Nov 06 '19 at 14:00
  • 1
    I don't think the python `re` library is expressive enough to handle arbitrarily nested parenthetical expressions. – President James K. Polk Nov 06 '19 at 20:57

1 Answers1

0

An example using recursive regex

([\d,.e]+|(cos|sin|sqrt)\((?R)\)|\([ ]*(?R)[ ]*\))([ ]*[-+*\/!=^]+[ ]*(?R))*

Could be improved to be more strict to match operators or numbers

  • (?R) refering recursively the whole regex
Nahuel Fouilleul
  • 18,726
  • 2
  • 31
  • 36
  • A great solution! However I really don't understand what you're doing. Can you please tell me a bit more what you're doing? I've noted that 'e' does match also 'e' characters in text. Can you please also add the 'pi' symbol? – Reman Nov 06 '19 at 14:54
  • 1
    `e` is in a character class however as `pi` is two character an alternation (`|`) should be used : `([\d,.e]+|pi)` but this also matches 1,23,4ee4252 which does not make sense, regex is just given as example – Nahuel Fouilleul Nov 06 '19 at 14:56
  • Thanks. Can you please tell me what is the python regex to capture less possible characters ? In vimscript you have `{-}` but that doesn't exist in python. P.e. `cos(30 * 2)`. I want to capture all characters between `cos(` and the next FIRST `)` – Reman Nov 06 '19 at 15:48
  • I've just noted that python gives the sin/cos/tan values in radius. What I want to do is to change it to degrees. `cos(..)` --> `math.cos(math.radians(..))` – Reman Nov 06 '19 at 16:01
  • The OP tagged this with Python, but the regex you give isn't valid with Python's standard library `re` module. You might want to say a bit more about how to make this work with Python. (Are you using the 3rd party `regex` module from PyPI?) – Mark Dickinson Nov 06 '19 at 20:39