TL;DR: Is there a way to specify a conditional so that an opening element MUST match its paired closing element?
Example is located on regex101.com.
=====
Balancing elements in regex is typically handled through recursion. This means that nested {...{...{...}...}...}
can be located.
Also, PCRE allows the (?(DEFINE)...)
construct, which lets you define various patterns without actually starting the match.
In the regular expression
# Define the opening and closing elements before the recursion occurs
(?(DEFINE)
(?<open_curly>\{)
(?<close_curly>\})
# ... other definitions here ...
(?<open>\g'open_curly')
(?<close>\g'close_curly')
)
# Match the opening element
(\g'open'
(?>
# For recursion, don't match either the opening or closing element
(?!\g'open'|\g'close')(?s:.)
|
# Recurse this captured pattern
(?-1)
)*
# Match the closing element
\g'close')
the elements are the {
and }
characters, and can match against patterns such as
{{{}}}
{ test1 { test2 { test3 { test4 } } } }
I want to include other open/close elements, such as [
and ]
, or --[
and --]
, so include those in the (?(DEFINE))
:
(?<open_square>\[)
(?<close_square>\])
(?P<open_pascal>(?i:\bbegin\b))
(?P<close_pascal>(?i:\bend\b))
(?P<open_lua>--\[)
(?P<close_lua>--\])
(?<open>\g'open_curly'|\g'open_square'|\g'open_pascal'|\g'open_lua')
(?<close>\g'close_curly'|\g'close_square'|\g'close_pascal'|\g'close_lua')
What this DOESN'T do correctly is to pair the opening element with its closing element, allowing --[
to group with }
, which is not desirable.
Is there a way to create open/close pairs in a regex like this?