Some modern regex flavors support recursion in Regex: Perl 5.10, PCRE 4.0, Ruby 2.0, and all later versions of these three, support regular expression recursion. Ruby 1.9 supports capturing group recursion (the whole regex can be recursed if wrapped in a capturing group.) .NET does not support recursion, but it supports balancing groups that can be used instead of recursion to match balanced constructs.
From regular-expressions.info:
Perl 5.10, PCRE 4.0, Ruby 2.0, and all later versions of these three, support regular expression recursion. Perl uses the syntax (?R)
with (?0)
as a synonym. Ruby 2.0 uses \g<0>
. PCRE supports all three as of version 7.7. Earlier versions supported only the Perl syntax (which Perl actually copied from PCRE). Recent versions of Delphi, PHP, and R also support all three, as their regex functions are based on PCRE. JGsoft V2 also supports all variations of regex recursion.
While Ruby 1.9 does not have any syntax for regex recursion, it does support capturing group recursion. So you could recurse the whole regex in Ruby 1.9 if you wrap the whole regex in a capturing group. .NET does not support recursion, but it supports balancing groups that can be used instead of recursion to match balanced constructs.
As we'll see later, there are differences in how Perl, PCRE, and Ruby deal with backreferences and backtracking during recursion. While they copied each other's syntax, they did not copy each other's behavior. JGsoft V2, however, copied their syntax and their behavior. So JGsoft V2 has three different ways of doing regex recursion, which you choose by using a different syntax. But these differences do not come into play in the basic example on this page.
Boost 1.42 copied the syntax from Perl but its implementation is marred by bugs, which are still not all fixed in version 1.62. Most significantly, quantifiers other than *
or {0,}
cause recursion to misbehave. This is partially fixed in Boost 1.60 which correctly handles ?
and {0,1}
too.
The regexes a(?R)?z
, a(?0)?z
, and a\g<0>?z
all match one or more letters a
followed by exactly the same number of letters z
. Since these regexes are functionally identical, we'll use the syntax with R
for recursion to see how this regex matches the string aaazzz
.
First, a
matches the first a
in the string. Then the regex engine reaches (?R)
. This tells the engine to attempt the whole regex again at the present position in the string. Now, a matches the second a
in the string. The engine reaches (?R)
again. On the second recursion, a matches the third a
. On the third recursion, a fails to match the first z
in the string. This causes (?R)
to fail. But the regex uses a quantifier to make (?R)
optional. So the engine continues with z
which matches the first z
in the string.
Now, the regex engine has reached the end of the regex. But since it's two levels deep in recursion, it hasn't found an overall match yet. It only has found a match for (?R)
. Exiting the recursion after a successful match, the engine also reaches z
. It now matches the second z
in the string. The engine is still one level deep in recursion, from which it exists with a successful match. Finally, z
matches the third z
in the string. The engine is again at the end of the regex. This time, it's not inside any recursion. Thus, it returns aaazzz
as the overall regex match.