2

I have a string containing different types of html tags and stuff, including some <img> elements. I am trying to wrap those <img> elements inside a <figure> tag. So far so good using a preg_replace like this:

preg_replace( '/(<img.*?>)/s','<figure>$1</figure>',$content); 

However, if the <img>tag has a neighboring <figcaption> tag, the result is rather ugly, and produces a stray end tag for the figure-element:

<figure id="attachment_9615">
<img class="size-full" src="http://www.example.com/pic.png" alt="name" width="1699" height="354" />
<figcaption class="caption-text"></figure>Caption title here</figcaption>
</figure> 

I've tried a whole bunch of preg_replace regex variations to wrap both the img-tag and figcaption-tag inside figure, but can't seem to make it work.

My latest try:

preg_replace( '/(<img.*?>)(<figcaption .*>*.<\/figcaption>)?/s',
'<figure">$1$2</figure>',
$content); 
gen_Eric
  • 223,194
  • 41
  • 299
  • 337
Thomas L.G
  • 81
  • 5
  • 2
    [why, oh why with regex, it never stops...](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – trincot May 25 '16 at 15:33
  • 2
    May I suggest *not* using a RegEx for this task? Have you considered a DOM parser? – gen_Eric May 25 '16 at 15:33
  • 1
    @RocketHazmat Well, sure. If you know another way to do this in Wordpress, with the purpose of cleaning up the RSS feed output for FB Instant Articles. I could probably remove some Wordpress content filters, and redo all of them, but wouldn't a regex be ...easier? – Thomas L.G May 25 '16 at 15:39
  • 4
    http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php – AbraCadaver May 25 '16 at 15:41
  • @AbraCadaver Thanks, I'll do some reading! – Thomas L.G May 25 '16 at 15:48
  • I don't see how you're getting that result.`` shouldn't match the `figcaption`. – Barmar May 25 '16 at 15:51
  • @ThomasL.G It looks like you messed with `.*>*.` in the figcaption, how about [something like this](https://regex101.com/r/iM4oX6/3) and replace by `
    $0
    `. Please show input sample that fails and expected output.
    – bobble bubble May 25 '16 at 16:48

1 Answers1

2

As others pointed out, better use a parser, i.e. DOMDocument instead. The following code wraps a <figure> tag around each img where the next sibling is a <figcaption>:

<?php

$html = <<<EOF
<html>
    <img class="size-full" src="http://www.example.com/pic.png" alt="name" width="1699" height="354" />
    <figcaption class="caption-text">Caption title here</figcaption>

    <img class="size-full" src="http://www.example.com/pic.png" alt="name" width="1699" height="354" />

    <img class="size-full" src="http://www.example.com/pic.png" alt="name" width="1699" height="354" />
    <figcaption class="caption-text">Caption title here</figcaption>
</html>
EOF;

$dom = new DOMdocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

# get all images
$imgs = $xpath->query("//img");

foreach ($imgs as $img) {
    if ($img->nextSibling->tagName == 'figcaption') {

        # create a new figure tag and append the cloned elements
        $figure = $dom->createElement('figure');
        $figure->appendChild($img->cloneNode(true));
        $figure->appendChild($img->nextSibling->cloneNode(true));

        # insert the newly generated elements right before $img
        $img->parentNode->insertBefore($figure, $img);

        # and remove both the figcaption and the image from the DOM
        $img->nextSibling->parentNode->removeChild($img->nextSibling);
        $img->parentNode->removeChild($img);

    }
}
$dom->formatOutput=true;
echo $dom->saveHTML();

See a demo on ideone.com.

To have a <figure> tag around all your images, you might want to add an else branch:

} else {
    $figure = $dom->createElement('figure');
    $figure->appendChild($img->cloneNode(true));
    $img->parentNode->insertBefore($figure, $img);

    $img->parentNode->removeChild($img);
}
Jan
  • 42,290
  • 8
  • 54
  • 79