1

Question and title was changed, see edit below

Original title was: Once-only subpattern with simple lookahead crashes PHP5 (but not 7)

Here is a minimal test case of a more complicated regex which I “optimized” to use Once-only (atomic) subpatterns. In PHP7 it works as advertised, but in PHP5 it has the opposite effects: after a certain input size it crashes PHP.

The purpose of the regex is to consume all characters until a multi-character(!) closing tag (like ?>) is found.

// http://php.net/manual/en/regexp.reference.onlyonce.php

/* $pattern = '/(?>(?:(?!>).)*)/'; // short version */

$pattern = '/
    (?>                # Once-only subpattern (
        (?:            #   non-capturing group for repetition (
            (?!  \?> ) #     if next characters are not the closing tag …
            .          #     … then match next char
        ) *            #   ) match it as long as possible
    )                  # ) simple and fast, isn’t it?
/x';

// $length = 3078; // in my case the crashing length was 3078
$length = 30000;   // in PHP7 length is unlimited
$s = str_repeat('o', 30000 );

preg_match( $pattern, $s ); // crash here

echo 'finished';

Live example: http://sandbox.onlinephpfunctions.com/code/385df44341da7bc34f0fa9f31fcfd25ec05714c6

Edit #1

After some simplification it seems that the bug has nothing to do with atomic or lookahead groups, it's much simpler.

new live example:

http://sandbox.onlinephpfunctions.com/code/82e14bf3259a3106d0dfd9acb7c2a72dc7a42f98

$pattern = '/
    (?:      #   non-capturing group for repetition (
        .    #     match anything
    ) *      #   ) … then repeat
/x';

Questions:

  • is it a known bug in PCRE?
  • how can it be avoided?
biziclop
  • 14,466
  • 3
  • 49
  • 65
  • First question: What is this thing supposed to detect? – tadman Dec 04 '17 at 17:54
  • Hi, it's purpose was to match opening and closing tags (non-recursively) from my hobby/experimental templating system. – biziclop Dec 04 '17 at 17:56
  • You might want to try in Python, Perl, or Ruby to see if it's a problem common to other implementations before declaring it a bug in PHP. If it is a bug in PHP it's worth reporting. – tadman Dec 04 '17 at 17:57
  • If you click the live example link, you can change php version to anything. Every php5 version I tried failed, and every php7 version worked. Same in my own machine. – biziclop Dec 04 '17 at 17:58
  • 1
    Your regex isn't optimized. You can use `[^>]*` instead of `(?:(?!>).)*` – ctwheels Dec 04 '17 at 17:59
  • @ctwheels: in my "live" regex, the closing tag was multi-character. – biziclop Dec 04 '17 at 18:00
  • @biziclop if you post the actual regex we might better be able to help you. Otherwise the correct answer is to optimize your regex with what I've just posted ^ – ctwheels Dec 04 '17 at 18:01
  • odd, change 30000 to 15841 it will work :/ – Lawrence Cherone Dec 04 '17 at 18:03
  • @ctwheels: but then it wouldn't be a minimal test case. :) Description of regex added to question. – biziclop Dec 04 '17 at 18:03
  • `(?>)` is an atomic group in regex. Do you have an actual string to test against and can you present a few more details about the opening/closing tag and what exactly you're trying to do against the string? – ctwheels Dec 04 '17 at 18:03
  • I'm trying to match various tags, like `[[…]]`, `<<…>>` and ``. The problem happened when I wanted to make php's closing tag optional. What I presented above is (was) the cleanest example of the bug I could produce. – biziclop Dec 04 '17 at 18:07
  • 1
    @biziclop try [this](https://regex101.com/r/hP68BA/1) – ctwheels Dec 04 '17 at 18:20
  • @ctwheels: thanks, I'm looking at it (and my own thing, comparing them). – biziclop Dec 04 '17 at 18:37
  • 1
    I feel like the problem has to do with [those](http://php.net/manual/en/pcre.configuration.php). Another guy had a similar [problem](https://stackoverflow.com/questions/6382330/does-this-php-code-crash-apache-for-anyone-else) with long matches... – Mateus Dec 04 '17 at 18:46
  • @Mateus: setting `pcre.recursion_limit` to a low number indeed prevents crashing. Also the bug is also triggered by this: `(?: . ) *`. I fail to see the recursion in this. PCRE doesn't. :) – biziclop Dec 04 '17 at 19:00
  • Ok, I marked my own question as duplicate, but someone could still point out where the hell is that explosive recursion in my regex … – biziclop Dec 04 '17 at 19:16
  • 1
    You just need to use `'~[^?]*(?:\?(?!>)[^?]*)*~'`. The complexity falls drastically compared to the tempered greedy token solution. As for TGT performance, you may refer to [this answer of mine](https://stackoverflow.com/a/37343088/3832970). – Wiktor Stribiżew Dec 04 '17 at 19:41
  • 1
    @WiktorStribiżew: Thank you very much. My head is starting to explode. :) – biziclop Dec 04 '17 at 20:14

0 Answers0