-2

I have got to search within an html message for image tags and then append the website url to any image url tag found using regular expression

E.g if image src in html message is

/images/my_image.jpg

I need to append the url and make it look like this:

http://mywebsite.com/page/images/my_image.jpg
Ous
  • 71
  • 1
  • 8
  • do you only need .jpg, or also .jpeg, .png, .gif and so on? – jvitasek Mar 21 '15 at 11:07
  • We get a lot of regular expression questions here on Stack Overflow, and many of them do not show that any effort has been expended on the problem. Thus, can you give this one a go? The PHP manual on regular expressions is very thorough. – halfer Mar 21 '15 at 12:11
  • @halfer: note that several features are not described at all in the PHP manual and can only be found in the pcre documentation: http://www.pcre.org – Casimir et Hippolyte Mar 21 '15 at 12:38
  • @Casimir, thanks - I've not seen that resource before. Useful to know! – halfer Mar 21 '15 at 14:25

2 Answers2

2

You probably should use an HTML parsing solution instead of regex, to avoid surprises with badly formatted code. Something like this:

// Some example source
$source = <<<EOS
<html><body>
    Images that will have host appended:
    <img src="foo.png" />
    and
    <img src="images/en/87a%20-zzQ.png" />

    Image that will be left as is:
    <img src="https://www.gravatar.com/avatar/1b1f8ad9a64564a9096056e33a4805bf?s=32&amp;d=identicon&amp;r=PG" />
</body></html>
EOS;

// Create a DOM document and read the HTML into it
$dom = new DOMDocument();
$dom->loadHTML($source);

// Use an XPath query to find all 'img' tags 
$xPath = new DOMXPath($dom);
$images = $xPath->query('//img');

// Loop through the tags
foreach ($images as $image) {
    // Grab the 'src' attribute
    $src = $image->getAttribute('src');

    // If the attribute does not already contain a scheme (e.g. http(s)),
    // append the URL with scheme and host
    if ($src && (!parse_url($src, PHP_URL_SCHEME))) {
        $image->setAttribute('src', "http://mywebsite.com/page/" . $src);
    }
}

// Write output
$dom->formatOutput = true;
echo $dom->saveHTML();

Output:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
        Images that will have host appended:
        <img src="http://mywebsite.com/page/foo.png">
        and
        <img src="http://mywebsite.com/page/images/en/87a%20-zzQ.png">

        Image that will be left as is:
        <img src="https://www.gravatar.com/avatar/1b1f8ad9a64564a9096056e33a4805bf?s=32&amp;d=identicon&amp;r=PG">
</body></html>
Community
  • 1
  • 1
mhall
  • 3,671
  • 3
  • 23
  • 35
1

You can use the following pattern:

<?php

    $pattern = "/(\/images\/[\w\d_]+\.jpg)\1*/ims";
    $string = "bla bla bla /images/my_image.jpg," . 
       "bla bla lba /images/mfsafas.jpg bla bla bla /images/my_fsa.jpg";

    preg_match_all($pattern, $string, $matches);

    foreach($matches[0] as $match) {
       $urls[] = "http://mywebsite.com/page" . $match;
    } 
Alexandru Olaru
  • 6,842
  • 6
  • 27
  • 53