3

I need to look inside a string of HTML and change all <img> tags where the src attribute is a relative address, to an absolute URL. So this:

<img src="puppies.jpg">

needs to become:

<img src="http://sitename.com/path/puppies.jpg">

while ignoring <img> tags whose src attribute is already absolute.

I'm using PHP and assume that I'll need to run this through preg_replace(). Help! And Thanks!

Sam
  • 2,152
  • 6
  • 31
  • 44

2 Answers2

8

This is not a job for a regular expression. It's a job for an XML/DOM parser.

I'd give DOMDocument a shot.

$DOM = new DOMDocument;
$DOM->loadHTML($html);

$imgs = $DOM->getElementsByTagName('img');
foreach($imgs as $img){
    $src = $img->getAttribute('src');
    if(strpos($src, 'http://sitename.com/path/') !== 0){
        $img->setAttribute('src', "http://sitename.com/path/$src");
    }
}

$html = $DOM->saveHTML();
gen_Eric
  • 223,194
  • 41
  • 299
  • 337
  • 1
    I upvoted, but it also needs a check for `src` attributes which are already absolute, per the OP. – Evan Davis Apr 30 '12 at 19:12
  • @Mathletics: Ah yes, didn't notice that, I can add that :-P – gen_Eric Apr 30 '12 at 19:12
  • @Jack: Good idea, changed :-P – gen_Eric Apr 30 '12 at 19:16
  • Yay! That does it! Question: the HTML that gets returned automatically gets a , , , etc... tags. Is there any way to turn that off? All I want is what I gave it to start... just with the find-and-replace part done. Does that make sense? – Sam May 01 '12 at 01:56
0

This is not a job for a regular expression. It's a job for an XML/DOM parser.

Nope it's not. If you just want to add a prefix to each src attribute, it's best to use simple string functions and don't even think about xml, regex or dom parsing…

$str = str_replace('<img src="', '<img src="http://prefix', $str);

You can clean up wrong links (already absolute ones) afterwards

$str = str_replace('<img src="http://prefixhttp://', '<img src="http://', $str);

Do not blow up your code with regexp/dom if you can avoid it.

sbstjn
  • 2,184
  • 1
  • 15
  • 13
  • What would happen happen if my HTML was ``? – gen_Eric Apr 30 '12 at 19:15
  • @Rocket sorry, but he said his HTML is `` – sbstjn May 01 '12 at 00:35
  • wrong links can easily be fixed: `str_replace('prefixprefix', 'prefix', $str)` `str_replace('http://prefix/http://', 'http://', $str)` don't blow up your code with regex/dom if you do not have to… – sbstjn May 01 '12 at 00:38
  • @semu, I'm totally with you on this one. For *this* instance (and mine) there's no reason to add all kinds of unnecessary overhead when a simple solution exist... albeit not the most graceful, it definitely gets the job done. – Sam Jul 07 '12 at 01:29