php - need a regular expression to find and append website url to img src

Question

I have got to search within an html message for image tags and then append the website url to any image url tag found using regular expression

E.g if image src in html message is

/images/my_image.jpg

I need to append the url and make it look like this:

http://mywebsite.com/page/images/my_image.jpg

We get a lot of regular expression questions here on Stack Overflow, and many of them do not show that any effort has been expended on the problem. Thus, can you give this one a go? The PHP manual on regular expressions is very thorough. — halfer, Mar 21 '15 at 12:11
@halfer: note that several features are not described at all in the PHP manual and can only be found in the pcre documentation: http://www.pcre.org — Casimir et Hippolyte, Mar 21 '15 at 12:38
@Casimir, thanks - I've not seen that resource before. Useful to know! — halfer, Mar 21 '15 at 14:25

score 2 · Answer 1 · edited May 23 '17 at 11:43

You probably should use an HTML parsing solution instead of regex, to avoid surprises with badly formatted code. Something like this:

// Some example source
$source = <<<EOS
<html><body>
    Images that will have host appended:
    <img src="foo.png" />
    and
    <img src="images/en/87a%20-zzQ.png" />

    Image that will be left as is:
    <img src="https://www.gravatar.com/avatar/1b1f8ad9a64564a9096056e33a4805bf?s=32&amp;d=identicon&amp;r=PG" />
</body></html>
EOS;

// Create a DOM document and read the HTML into it
$dom = new DOMDocument();
$dom->loadHTML($source);

// Use an XPath query to find all 'img' tags 
$xPath = new DOMXPath($dom);
$images = $xPath->query('//img');

// Loop through the tags
foreach ($images as $image) {
    // Grab the 'src' attribute
    $src = $image->getAttribute('src');

    // If the attribute does not already contain a scheme (e.g. http(s)),
    // append the URL with scheme and host
    if ($src && (!parse_url($src, PHP_URL_SCHEME))) {
        $image->setAttribute('src', "http://mywebsite.com/page/" . $src);
    }
}

// Write output
$dom->formatOutput = true;
echo $dom->saveHTML();

Output:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
        Images that will have host appended:
        <img src="http://mywebsite.com/page/foo.png">
        and
        <img src="http://mywebsite.com/page/images/en/87a%20-zzQ.png">

        Image that will be left as is:
        <img src="https://www.gravatar.com/avatar/1b1f8ad9a64564a9096056e33a4805bf?s=32&amp;d=identicon&amp;r=PG">
</body></html>

Using DOM parser is the way to go. – Toto Mar 22 '15 at 11:12 — Toto, Mar 22 '15 at 11:12

score 1 · Answer 2 · answered Mar 21 '15 at 10:56

1

You can use the following pattern:

<?php

    $pattern = "/(\/images\/[\w\d_]+\.jpg)\1*/ims";
    $string = "bla bla bla /images/my_image.jpg," . 
       "bla bla lba /images/mfsafas.jpg bla bla bla /images/my_fsa.jpg";

    preg_match_all($pattern, $string, $matches);

    foreach($matches[0] as $match) {
       $urls[] = "http://mywebsite.com/page" . $match;
    }

answered Mar 21 '15 at 10:56

Alexandru Olaru

6,842
6
27
53

2

The `\w` character class already contains `\d` and `_`. – Casimir et Hippolyte Mar 21 '15 at 12:45
Why do you limit filename to word characters? Isn't `a-b.jpg` valid? – Toto Mar 22 '15 at 11:11

php - need a regular expression to find and append website url to img src

2 Answers2