4

Language and package

python3.8, regex

Description

The inputs and wanted outputs are listed as following:

if (programWorkflowState.getTerminal(1, 2) == Boolean.TRUE) {

Want: programWorkflowState.getTerminal(1, 2) and Boolean.TRUE

boolean ignore = !_isInStatic.isEmpty() && (_isInStatic.peek() == 3) && isAnonymous;

Want: _isInStatic.peek() and 3

boolean b = (num1 * ( 2 + num2)) == value;

Want: (num1 * ( 2 + num2)) and value

My current regex

((?:\((?:[^\(\)]|(?R))*\)|[\w\.])+)\s*==\s*((?:\((?:[^\(\)]|(?R))*\)|[\w\.])+)

This pattern want to match \((?:[^\(\)]|(?R))*\) or [\w\.] on both side of "=="

Result on regex101.com

Problem: It failed to match the recursive part (num1 * ( 2 + num2)).

The explanation of the recursive pattern \((?:m|(?R))*\) is here

But if I only use the recursive pattern, it succeeded to match (num1 * ( 2 + num2)) as the image shows.

What's the right regex to achieve my purpose?

Martin Zeitler
  • 1
  • 19
  • 155
  • 216
  • `\((?:m|(?R))*\)` uses `(?R)` that recurses the *entire* pattern. You need to wrap the pattern you need to recurse with a group and use a subroutine instead of `(?R)`, e.g. `(?P\((?:m|(?&aux))*\))` to recurse a pattern inside a longer one. – Wiktor Stribiżew Sep 09 '20 at 09:46
  • Thanks. The named pattern works. Finally, my regex is `((?:(?P\((?:[^\(\)]|(?&p1))*\))|(?P[\w\.]))+)\s*[!=]=\s*((?:(?&p1)|(?&p2))+)` – Xiaowen Zhang Sep 10 '20 at 06:47

1 Answers1

1

The \((?:m|(?R))*\) pattern contains a (?R) construct (equal to (?0) subroutine) that recurses the entire pattern.

You need to wrap the pattern you need to recurse with a group and use a subroutine instead of (?R) recursion construct, e.g. (?P<aux>\((?:m|(?&aux))*\)) to recurse a pattern inside a longer one.

You can use

((?:(?P<aux1>\((?:[^()]++|(?&aux1))*\))|[\w.])++)\s*[!=]=\s*((?:(?&aux1)|[\w.])+)

See this regex demo (it takes just 6875 steps to match the string provided, yours takes 13680)

Details

  • ((?:(?P<aux1>\((?:[^()]++|(?&aux1))*\))|[\w.])++) - Group 1, matches one or more occurrences (possessively, due to ++, not allowing backtracking into the pattern so that the regex engine could not re-try matching a string in another way if the subsequent patterns fail to match)
    • (?P<aux1>\((?:[^()]++|(?&aux1))*\)) - an auxiliary group "aux1" that matches (, then zero or more occurrences of either 1+ chars other than ( and ) or the whole Group "aux1" pattern, and then a )
    • | - or
    • [\w.] - a letter, digit, underscore or .
  • \s*[!=]=\s* - != or == with zero or more whitespace on both ends
  • ((?:(?&aux1)|[\w.])+) - Group 2: one or more occurences of Group "aux" pattern or a letter, digit, underscore or ..
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563