The point here is that [\s\S]*
is a *
quantified subpattern that allows a regex engine to backtrack once the subsequent subpatterns fail to match, but the recursion calls in PCRE are atomic, i.e. there is no way for the engine to backtrack when it grabs any 0+ chars with (?P>test)
, and that is why the pattern fails to match.
In short, @123(?:(?:(?P<test>[\s\S]*)456(?P<test1>(?P>test))789))@
pattern can be re-written as
@123(?:(?:(?P<test>[\s\S]*)456(?P<test1>[\s\S]*+)789))@
^^
and as [\s\S]*+
already matches 789
, the engine cannot backtrack to match 789
pattern part.
See PCRE docs:
In PCRE (like Python, but unlike Perl), a recursive subpattern call is always treated as an atomic group. That is, once it has matched some of the subject string, it is never re-entered, even if it contains untried alternatives and there is a subsequent matching failure.
No idea why they mention Python here since re
does not support recursion (unless they meant the PyPi regex module).
If you are looking for a solution, you might use a (?:(?!789)[\s\S])*
tempered greedy token instead of [\s\S]*
, it will only match any char if it does not start a 789
char sequence (so, no need to backtrack to accommodate for 789
):
123(?:(?:(?P<test>(?:(?!789)[\s\S])*)456(?P<test1>(?P>test))789))
^^^^^^^^^^^^^^^^^^
See this regex demo.