Question and title was changed, see edit below
Original title was: Once-only subpattern with simple lookahead crashes PHP5 (but not 7)
Here is a minimal test case of a more complicated regex which I “optimized” to use Once-only (atomic) subpatterns. In PHP7 it works as advertised, but in PHP5 it has the opposite effects: after a certain input size it crashes PHP.
The purpose of the regex is to consume all characters until a multi-character(!) closing tag (like ?>
) is found.
// http://php.net/manual/en/regexp.reference.onlyonce.php
/* $pattern = '/(?>(?:(?!>).)*)/'; // short version */
$pattern = '/
(?> # Once-only subpattern (
(?: # non-capturing group for repetition (
(?! \?> ) # if next characters are not the closing tag …
. # … then match next char
) * # ) match it as long as possible
) # ) simple and fast, isn’t it?
/x';
// $length = 3078; // in my case the crashing length was 3078
$length = 30000; // in PHP7 length is unlimited
$s = str_repeat('o', 30000 );
preg_match( $pattern, $s ); // crash here
echo 'finished';
Live example: http://sandbox.onlinephpfunctions.com/code/385df44341da7bc34f0fa9f31fcfd25ec05714c6
Edit #1
After some simplification it seems that the bug has nothing to do with atomic or lookahead groups, it's much simpler.
new live example:
http://sandbox.onlinephpfunctions.com/code/82e14bf3259a3106d0dfd9acb7c2a72dc7a42f98
$pattern = '/
(?: # non-capturing group for repetition (
. # match anything
) * # ) … then repeat
/x';
Questions:
- is it a known bug in PCRE?
- how can it be avoided?