1

I need to find links in a part of some html code and replace all the links with two different absolute or base domains followed by the link on the page...

I have found a lot of ideas and tried a lot different solutions.. Luck aint on my side on this one.. Please help me out!! Thank you!!

This is my code:

<?php
$url = "http://www.oxfordreference.com/views/SEARCH_RESULTS.html?&q=android";
$raw = file_get_contents($url);
$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
$content = str_replace($newlines, "", html_entity_decode($raw));

$start = strpos($content,'<table class="short_results_summary_table">');
$end = strpos($content,'</table>',$start) + 8;
$table = substr($content,$start,$end-$start);

echo "{$table}";

$dom = new DOMDocument();
$dom->loadHTML($table);

$dom->strictErrorChecking = FALSE;

// Get all the links
$links = $dom->getElementsByTagName("a");
foreach($links as $link) {
  $href = $link->getAttribute("href");
  echo "{$href}";

  if (strpos("http://oxfordreference.com", $href) == -1) {
  if (strpos("/views/", $href) == -1) {
     $ref = "http://oxfordreference.com/views/"+$href;
  }
  else 
      $ref = "http://oxfordreference.com"+$href;
    $link->setAttribute("href", $ref);
    echo "{$link->getAttribute("href")}";
  }
}
$table12 = $dom->saveHTML;

preg_match_all("|<tr(.*)</tr>|U",$table12,$rows);

echo "{$rows[0]}";

foreach ($rows[0] as $row){

    if ((strpos($row,'<th')===false)){

        preg_match_all("|<td(.*)</td>|U",$row,$cells);       
        echo "{$cells}";
    }

}
?>

When i run this code i get htmlParseEntityRef: expecting ';' warning for the line where i load the html

scrappedcola
  • 10,423
  • 1
  • 32
  • 43
gUgU
  • 75
  • 3
  • 8
  • Give us some sample HTML, and tell us how you want it to become. Show us your coding efforts! Do you want to do it in PHP or JavaScript? – Shef Jul 26 '11 at 19:16
  • 2
    when you say "Luck aint on my side on this one" does that mean you have found x and attempted y and haven't got it working? If so please show your attempt and we can go from there – Andreas Wong Jul 26 '11 at 19:17
  • removed the javascript tag as you are doing this server side. – scrappedcola Jul 26 '11 at 19:44
  • The strpos in your script is wrong. It's strpos ( haystack, needle, [position ). – Micromega May 27 '12 at 19:14

3 Answers3

5

var links = document.getElementsByTagName("a"); will get you all the links. And this will loop through them:

 for(var i = 0; i < links.length; i++)
    {
        links[i].href = "newURLHERE";
    }
scrappedcola
  • 10,423
  • 1
  • 32
  • 43
2

You should use jQuery - it is excellent for link replacement. Rather than explaining it here. Please look at this answer.

How to change the href for a hyperlink using jQuery

Community
  • 1
  • 1
Boz
  • 1,178
  • 1
  • 12
  • 26
2

I recommend scrappedcola's answer, but if you dont want to do it on client side you can use regex to replace:

ob_start();
//your HTML

//end of the page
$body=ob_get_clean();
preg_replace("/<a[^>]*href=(\"[^\"]*\")/", "NewURL", $body);
echo $body;

You can use referencing (\$1) or callback version to modify output as you like.

Cem Kalyoncu
  • 14,120
  • 4
  • 40
  • 62