You were almost there. The only missing bit was that the logical expression isn't being contained in parenthesis. The AND
and OR
that you want to capture. Your regex expression requires everything to sit in the middle of parenthesis.
Also what you call groups seems to actually be matches, where
((weight gt 10) OR (weight lt 100)) AND (length lt 50)
is matched twice:
- First match is
((weight gt 10) OR (weight lt 100))
- Second match is
(length lt 50)
There are only two groups in your expression and they are identical, since group1 (g1), the outer most parenthesis, really is the entire expression (g0).
Since your expression matches any contained logic, I just expanded it a bit, adding a enclosing optional non-caputing group, that consists of capturing groups that you presented:
(?:([^()]+)((?1)))?
Combined it becomes
(\((?>[^()]+|(?1))*\))(?:([^()]+)((?1)))?
^----------- g1 -----^ ^-g2--^^-g3-^
The (?1)
still references the first group as in your original expression. All the below are matches and their respective groups:
(weight gt 10)
^--- g1 -----^
(weight gt 10) OR (weight lt 100)
^--- g1 -----^ g2 ^--- g3 ------^
((weight gt 10) OR (weight lt 100)) AND (length lt 50)
^-------------- g1 ---------------^ g2 ^--- g3 -----^
(length lt 50) AND ((weight gt 10) OR (weight lt 100))
^--- g1 -----^ g2 ^------------- g3 ----------------^
(length lt 50) nonsense ((weight gt 10) OR (weight lt 100))
^--- g1 -----^ g2 ^-------------- g3 ---------------^
The character glass only excludes parenthesis, so any nonsense is matched.
Your expression broken down:
( # capturing group 1
\( # match a `(` literally
(?> # atomic/independent, non-capturing group (meaning no backtracking into the group)
[^()] # any character that is not `(` nor `)`
+ # one or more times
| # or
(?1) # recurse group 1.
# ..this is like a copy of the expression of group 1 here.
# ..which also includes this part.
# ..so it's sort of self-recursing
)* # zero or more times
\) # match a `)` literally
)
The addition broken down:
(?: # non-capturing group
( # capturing group 2
[^()] # any character that is not `(` nor `)`
+ # one or more times
)
( # capturing group 3
(?1) # recurse group 1.
)
)? # zero or one time
The expression at regex101. Here I changed the character class to [^()\n]
to avoid issues with newlines.