2

I'm struggling to parse some nested formulae using the python regex module, even with overlapped=True.

Example (ra is my shorthand for 'rightarrow', i.e. implication - but it could be anything):

pattern = r"^\((.+) ra (.+)\)$"

re.findall(pattern, "(a ra b)", overlapped=True)
# Gives [('a', 'b')], as expected.

re.findall(pattern, "(a ra (b ra c))", overlapped=True)
# Gives [('a ra (b', 'c)')]
# Expected [('a ra (b', 'c)'), ('a', '(b ra c)')]

Naturally, the ('a', '(b ra c)') result is what I'm looking for - and I was expecting overlapped=True to yield this.

Note: I realise this could be done using recursive regex, e.g. (?P<formula>([abc]|\((?P<left>(?&formula)) ra (?P<right>(?&formula))\))), but this doesn't help in case I want the uglier ('a ra (b', 'c)') answer too.

jamesh
  • 21
  • 1
  • For a language like this, regex won't help. `overlapped=True` is only meant to extract overlapping matches that start at *different* locations in a string. You need a parser. – Wiktor Stribiżew Aug 03 '20 at 22:17
  • Thanks @WiktorStribiżew - your comment led me to finding [this](https://stackoverflow.com/questions/44641841/regex-including-overlapping-matches-with-same-start?rq=1) equivalent question. Looks like regex is unfortunately not the tool for the job. – jamesh Aug 03 '20 at 22:50

0 Answers0