1

My php code:

      $exp = 'zzz<pre>sssss<pre>fff</pre>ff</pre>zzz';     
      \preg_match_all("#<pre>((?>[^(?:<pre>)(?:</pre>)]|(?R))*)</pre>#si", $exp, $matches);

        $i = 0;
        foreach ($matches as $item) {           
            foreach ($item as $elem)
            {
                echo "$i  ", \htmlentities($elem), "<br>";
            }
            $i++;
        }

Output:

0 <pre>sssss<pre>fff</pre>ff</pre>

1 sssss<pre>fff</pre>ff

That is good - regex works and finds nested tags <pre>. But I have one problem:

[^(?:<pre>)(?:</pre>)]

I can set a dismath with charaters < / p r e >, but I need to set a dismach with strings <pre> and </pre>. Therefore, if I add in the original text at least the symbol p or r, regex do not work as it should.

Example: $exp = zzz<pre>ssspss<pre>fff</pre>ff</pre>zzz; // p inside ssspss

Output

0 <pre>fff</pre>

1 fff

Tell me, how to build the regular expression to set a mismatch with the string, rather than individual characters?

Community
  • 1
  • 1
RussCoder
  • 889
  • 1
  • 7
  • 18

1 Answers1

4

Probably you want to use a negative lookahead instead of the negated character class:

~<pre>((?>(?!</?pre).|(?R))*)</pre>~si

See test at regex101.com

Your regex didn't work as expected, because [^(?:<pre>)(?:</pre>)] matches any character that is not in the [^ negated character class. Any that is not one of: <,/,p,r,e,),(,?,:,>


Sidenote: Regex is not appropriate for parsing arbitrary nested html. Consider using a parser.

Community
  • 1
  • 1
Jonny 5
  • 12,171
  • 2
  • 25
  • 42