I need to insert <p>
tags to surround each list element in a HTML fragment. This must not create nested paragraphs, which is why i want to use lookahead/lookbehind assertions to detect if the content is already enclosed in a paragraph tag.
So far, i've come up with the following code.
This example uses a negative lookbehind assertion to match each </li>
closing tag which is not preceeded by a </p>
closing tag and arbitrary whitespace:
$html = <<<EOF
<ul>
<li>foo</li>
<li><p>fooooo</p></li>
<li class="bar"><p class="xy">fooooo</p></li>
<li> <p> fooooo </p> </li>
</ul>
EOF;
$html = preg_replace('@(<li[^>]*>)(?!\s*<p)@i', '\1<p>', $html);
$html = preg_replace("@(?<!</p>)(\s*</li>)@i", '</p>\1', $html);
echo $html, PHP_EOL;
which to my surprise results in the following output:
<ul>
<li><p>foo</p></li>
<li><p>fooooo</p></li>
<li class="bar"><p class="xy">fooooo</p></li>
<li> <p> fooooo </p> </p> </li>
</ul>
The insertion of the beginning tag works as expected, but note the additional </p>
tag inserted in the last list element!
Can somebody explain why the whitespace (\s*
) is totally ignored in the regex when a negative lookbehind assertion is used?
And even more important: what can i try else to achieve the mentioned goal?