4

I'm trying to find all href links on a webpage and replace the link with my own proxy link.

For example

<a href="http://www.google.com">Google</a>

Needs to be

<a href="http://www.example.com/?loadpage=http://www.google.com">Google</a>
Glenn Dayton
  • 1,410
  • 2
  • 20
  • 38

3 Answers3

9

Use PHP's DomDocument to parse the page

$doc = new DOMDocument();

// load the string into the DOM (this is your page's HTML), see below for more info
$doc->loadHTML('<a href="http://www.google.com">Google</a>');

//Loop through each <a> tag in the dom and change the href property
foreach($doc->getElementsByTagName('a') as $anchor) {
    $link = $anchor->getAttribute('href');
    $link = 'http://www.example.com/?loadpage='.urlencode($link);
    $anchor->setAttribute('href', $link);
}
echo $doc->saveHTML();

Check it out here: http://codepad.org/9enqx3Rv

If you don't have the HTML as a string, you may use cUrl (docs) to grab the HTML, or you can use the loadHTMLFile method of DomDocument

Documentation

Chris Baker
  • 49,926
  • 12
  • 96
  • 115
0

Just another option if you would like to have the links replaced with by jQuery you could also do the following:

$(document).find('a').each(function(key, element){
   curValue = element.attr('href');
   element.attr('href', 'http://www.example.com?loadpage='+curValue);

});

However a more secure way is doing it in php offcourse.

John In't Hout
  • 304
  • 1
  • 10
-1

Simplest way I can think to do this:

$loader = "http://www.example.com?loadpage=";
$page_contents = str_ireplace(array('href="', "href='"), array('href="'.$loader, "href='".$loader), $page_contents);

But that might have some problems with urls containing ? or &. Or if the text (not code) of the document contains href="

ben
  • 1,946
  • 2
  • 18
  • 26