0

i have some problems with my regex:

if (preg_match_all('/{[a-z]+:ce_img:(single|pair)(\s.*)*}/', $files, $matches))
{
   echo "ok";
}

For some reason it crashes my site. Ofcourse i already tried to google it and found something about "Catastrophic backtracking" although i'm not sure if this is my problem.

The regex should give me everything between {eggs:ce_img:single(or pair) till the ending }

When i try to change or remove (single|pair) it runs just normally. So it should be something regarding that right?

I'm quite sure that $files isn't the problem.

Does someone know how to solve this?

Regards, Olcan

EDIT: Here an example of how this regex should work: image

Olcan Teke
  • 58
  • 7
  • Can you give us an example string you're trying to parse? – Jay Blanchard Oct 03 '16 at 12:32
  • check the edit i provided. Thanks! – Olcan Teke Oct 03 '16 at 12:41
  • what about `([^}]*)` instead of `(\s.*)*` – Marek Janoud Oct 03 '16 at 12:42
  • The screen proves that there is no catastrophical backtracking, the regex works as expected, the question is unclear. Please provide exact sample input text to repro the issue and exact expected output. – Wiktor Stribiżew Oct 03 '16 at 12:43
  • Wiktor, I just want everything between the {blabla:ce_img:single (or pair) and the ending } returned (even when there are new lines) to check if a certain attribute is set. – Olcan Teke Oct 03 '16 at 12:49
  • Great, then [your regex works](https://regex101.com/r/U0GC7v/1). What attribute are you talking about? There is nothing about attributes in your question. Check the [MCVE (minimal complete verifiable example)](http://stackoverflow.com/help/mcve). – Wiktor Stribiżew Oct 03 '16 at 12:51
  • if u check the image i provided: i want to check if the "fallback_src" is set, i whas planning to do that with a strpos() But i will need everything inside the tag returned for that, – Olcan Teke Oct 03 '16 at 12:53
  • 1
    But that's not the question, the question is: why is my site crashing, it clearly is the regex. – Olcan Teke Oct 03 '16 at 12:59
  • That is a cool image, however, no one can test any pattern against an *image*. Please provide exact *text* you are using, state your issue clearly, state the expected behavior. I cannot repro any crash for the time being. – Wiktor Stribiżew Oct 03 '16 at 13:15
  • 1
    Catastrophic backtracking in this case happens if you have multiple spaces at the end of input string . Check for it. – revo Oct 03 '16 at 13:16

2 Answers2

2

Your RegEx crashes your site (due to catastrophic backtracking) because your input file contains at least one of these:

  1. Multiple space characters after target block
  2. A sequence of characters with spaces between after target block. Similar as \s.+.

Solution:

{[a-z]+:ce_img:(?:single|pair)(?:\s+[\w-]+="[^"]*")*\s*}

This specifically matches your pattern. Explanation of last different part:

(?:                    # Start of non-capturing group (a)
    \s+[\w-]+="[^"]*"  # Match similar following string `attr="value"`
)*                     # Many or zero times - end of non-capturing block (a)
\s*                    # Match all space characters if any, before closing brace `}`
revo
  • 47,783
  • 14
  • 74
  • 117
  • Please share your expertise here: [Extract all words between two phrases using regex](https://stackoverflow.com/q/51146300/2943403) – mickmackusa Jul 03 '18 at 06:23
1

This should work:

{[a-z]+:ce_img:(?:single|pair)?([\w\W\s]+)*}

For the text that presented in your image:

{eggs:ce_img:single 
   src="{src}"
   fallback_src="/assets/a-b-c.jpg"
   width="250"
   height="250"
   add_dims="no"
   crop="yes"
   title="{title}"
   alt="{title}"
   allow_scale_larger="yes"
}

You'll get:

Group 1     src="{src}"
            fallback_src="/assets/a-b-c.jpg"
            width="250"
            height="250"
            add_dims="no"
            crop="yes"
            title="{title}"
            alt="{title}"
            allow_scale_larger="yes"

See demo here: https://regex101.com/r/uo9Kqi/1

  • Are you sure this regex answers your question? @OlcanTeke – revo Oct 03 '16 at 13:28
  • Actually, after some testing it doesn't, instead of returning whats inside the { } it returns everything from the beginning of the first { till the end of the input string. – Olcan Teke Oct 03 '16 at 13:33