I'm trying to parse string in the following format (EBNF, I hope this is right) in PHP:
<exp> ::= <base>[{<modifier>["!"]"("<exp>")"}]
<base> ::= <role>[{<modifier><role>}]
<modifier> ::= "&" | "|"
<role> ::= ["!"]<str>[","<str>]
Where <str>
is any string that would pass [a-zA-Z0-9\-]+
The following are example of patterns that would have to be parsed:
token1
token1&token2
token1|(token2&!token3)
(token1&token2)|(token3&(token4|(!token5,12&token6)))
!(token1&token2|(token3&!token4))|token5,12
I am trying to write a RegEx pattern that would always give me four groups:
- The left-most
<expression>
. From the above example this would be:token1
token1
token1
token1&token2
token1&token2|(token3&!token4)
- If
["!"]
was present. I.e.null
null
null
null
!
- The
<modifier>
for the next<expression>
(if any). This would be:null
&
|
|
|
- The remaining of the pattern.
null
token2
token2&!token3
token3&(token4|(!token5,12&token6))
token5,12
I can parse this provided that the first expression doesn't contain any <modifier>
s.
^\(?(!?)([a-zA-Z0-9\-]+)\)?([&|]?)(.*)$
I am stuck at this point. I have tried using lookarounds, however I can't figure out how to ensure that the group is captured when all brackets are balanced. Is this achievable with RegEx or do I need to write code using loops etc. to do this?