7

I am using the following code to grab html from another page and place it into my php page:

$doc = new DomDocument;

// We need to validate our document before refering to the id
$doc->validateOnParse = true;
$doc->loadHtml(file_get_contents('{URL IS HERE}'));
$content = $doc->getElementById('form2');

echo $doc->SaveHTML($content);

I want to change all instances of <a href="/somepath/file.htm"> so that I can prepend to it the actual domain instead. How can I do this?

So, it would need to change them to: <a href="http://mydomain.com/somepath/file.htm"> instead.

Solomon Closson
  • 6,111
  • 14
  • 73
  • 115
  • If I were you, I will avoid using `DomDocument` and directly use regex to find out the links & edit it. – Raptor Mar 18 '13 at 03:23
  • 3
    how come? Everywhere I go on Stack Overflow, they say you should use `DomDocument` for this. Can you give me an example of how to do this with a regex? – Solomon Closson Mar 18 '13 at 03:25
  • You create extra objects for just find & replace tasks. Extra parsing time & memory spent. Try: http://stackoverflow.com/questions/4001328/php-regex-to-get-string-inside-href-tag – Raptor Mar 18 '13 at 03:29

1 Answers1

12

try something like:

$xml = new DOMDocument(); 
$xml->loadHTMLFile($url); 
foreach($xml->getElementsByTagName('a') as $link) { 
   $oldLink = $link->getAttribute("href");
   $link->setAttribute('href', "http://mydomain.com/" . $oldLink);
}
echo $xml->saveHtml();
Sudhir Bastakoti
  • 99,167
  • 15
  • 158
  • 162
  • But the `href` is different for each link, so I would just need to prepend the domain to it. Would it just be: `$link->setAttribute('href', "http://mydomain.com/" + $link->getAttribute('href'));` ?? – Solomon Closson Mar 18 '13 at 03:36
  • Ok, great, but I gotta get the `$content` not the whole document. Anyways, I figured it out from your answer. So, you got it. Thanks :) – Solomon Closson Mar 18 '13 at 03:47