Strip out first IMG elements in an HTML block

Question

I have a PHP app that grabs HTML from third party sources, the HTML may come with one or more IMG elements in it. I want to grab the first IMG instance in it's entirety, but am not sure how to go about that.

Can anyone push me in the right direction?

Thanks.

Check out http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php — jeroen, Nov 01 '14 at 00:15

score 1 · Accepted Answer · edited May 23 '17 at 10:33

You could use XPath to parse the html, and pull out the data you want that way. It's a little more involved than string position checking, but has the advantage of being a bit more robust should you decide that you want something more specific (src and alt of first img tag, for example).

First you load the html string in to a DOMDocument, which is then loaded in to XPath.

// Load html in to DOMDocument, set up XPath
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

We want the first img that occurs on the page, so use the selector /descendant::img[1]. N.B, this is not the same as //img[1], though that may often give similar results. There's a good explanation here on the difference between the two.

$matches = $xpath->evaluate("/descendant::img[1]");

A downside of using XPath is that it doesn't make it easy to say "give me back the full string that was matched for that img tag", so we can put together a simple function that'll iterate over the matched node's attributes and re-build an img tag.

$tag = "<img ";
foreach ($node->attributes as $attr) {
    $vals[] = $attr->name . '="' . $attr->value . '"';
}
$tag .= implode(" ", $vals) . " />";

Putting it all together we get something like:

<?php
// Example html
$html = '<html><body>'
    . ' <img src="/images/my-image.png" alt="My image" width="100" height="100" />'
    . 'Some text here <img src="do-not-want-second.jpg" alt="No thanks" />';

// Load html in to DOMDocument, set up XPath
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

// Get the first img in the doc
// N.B. Not the same as "//img[1]" - see https://stackoverflow.com/a/453902/2287
$matches = $xpath->evaluate("/descendant::img[1]");
foreach ($matches as $match) {
    echo buildImgTag($match);
}

/**
 * Build an img tag given it's matched node
 *
 * @param DOMElement $node Img node
 *
 * @return Rebuilt img tag
 */
function buildImgTag($node) {
    $tag = "<img ";
    $vals = array();
    foreach ($node->attributes as $attr) {
        $vals[] = $attr->name . '="' . $attr->value . '"';
    }
    $tag .= implode(" ", $vals) . " />";

    return $tag;
}

```

So overall it's a slightly more complex approach than doing a strpos or regex on the html, but should provide you with more flexibility should you decide to do anything with the img tag, like pulling out a specific attribute.

BentCoder · Answer 2 · 2014-11-01T00:50:16.470

Example below would work if you assume that the HTML is a valid HTML however we cannot assume that! If you are 100% sure that it would be a valid HTML then go ahead and use it, if not I would suggest you to use BETTER WAY as shown below.

$html = '<br />First<img src="path/abc.jpg" />Next<img src="path/cde.jpg" />';

$start = stripos($html, '<img');
$extracted = substr($html, $start);
$end = stripos($extracted, '>');

echo substr($html, $start, $end+1);

This code will give you: <img src="path/abc.jpg" />

Find the first occurrence of <img with case-insensitive function stripos
Chop actual data starting from the first occurence point.
Find the first occurrence of > with case-insensitive function stripos
Extract what falls in between starting and end point with substr.

BETTER WAY:

PHP Simple HTML DOM Parser Manual

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images 
foreach($html->find('img') as $element) {
       echo $element->src . '<br>';
}

A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
Require PHP 5+.
Supports invalid HTML.
Find tags on an HTML page with selectors just like jQuery.
Extract contents from HTML in a single line.

score -3 · Answer 3 · answered Nov 01 '14 at 00:09

-3

jQuery could do this for ya.

$('img')[0]

If it's in a smaller subsection of HTML within your page, then adjust the selector accordingly.

answered Nov 01 '14 at 00:09

Scott

439
3
9

That's not very useful for a PHP app. – jeroen Nov 01 '14 at 00:17

Strip out first IMG elements in an HTML block

3 Answers3