0

I have some HTML that contains (among other things) p-tags and figure-tags that contain one img-tag.
For the sake of simplicity I'll define an example of what can be found in the HTML here in a PHP variable:

$content = '<figure class="image image-style-align-left">
<img src="https://placekitten.com/g/200/300"></figure>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>';

I use DOMDocument to get $content and in this example I'll change the src attribute of all img-elements within a figure-element:

$dom = new DOMDocument();
libxml_use_internal_errors(true);

// this needs to be encoded otherwise special characters get messed up.
$domPart = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
$dom->loadHTML($domPart, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$domFigures = $dom->getElementsByTagName('figure');

foreach ($domFigures as $domFigure) {

    $img = $domFigure->getElementsByTagName('img')[0];
    if ($img) {
        $img->setAttribute('src', "https://placekitten.com/g/400/500");
    }

}

$result = $dom->saveHTML();

The result is:

<figure class="image image-style-align-left">
<img src="https://placekitten.com/g/400/500">
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>
</figure>

Somehow my p-element has moved into my figure-element. Why does this happen and what can I do to prevent it?

Live DEMO

Dirk J. Faber
  • 4,360
  • 5
  • 20
  • 58

2 Answers2

1

The re-arrangement is done by the LIBXML_HTML_NOIMPLIED option you're using. Looks like it's not stable enough for your case.

Look at this answer : loadHTML LIBXML_HTML_NOIMPLIED on an html fragment generates incorrect tags And How to saveHTML of DOMDocument without HTML wrapper?

Note : PHP 5.4 and Libxml 2.6 loadHTML now has a $option parameter which instructs Libxml about how it should parse the content.

Mohamed Sa'ed
  • 781
  • 4
  • 13
1

A DomDocument has to have a single root element, so it will move all following siblings inside the first top-level element.

You could most easily address this by bookending your content with a container tag e.g.

$content = '<div><figure class="image image-style-align-left">
<img src="https://placekitten.com/g/200/300"></figure>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p></div>';
Headbank
  • 361
  • 2
  • 6