1

I want to add rel="nofollow" in all links in my website if the links link to other website.

For example,

$str = "<a href='www.linktoothersite.com'>I swear this isn't spam!</a><br><a href='www.mywebsite.com'>Hello World</a>";

The output should be

$str = "<a href='www.linktoothersite.com' rel="nofollow">I swear this isn't spam!</a><br><a href='www.mywebsite.com'>Hello World</a>";

I really want to regular expression but not DDOMDocument. Because when I using DOMDocument I always got error " Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity"

  • possible duplicate of [How do I programmatically add rel="external" to external links in a string of HTML?](http://stackoverflow.com/questions/5608874/how-do-i-programmatically-add-rel-external-to-external-links-in-a-string-of-htm) – mario Jun 24 '11 at 20:07
  • don't parse html with regex. use DOMDocument instead. – dqhendricks Jun 24 '11 at 20:16

1 Answers1

4

Use a DOM parser and loop over all the links, checking their href attribute for other sites. This is untested and might require some tweaking.

// assuming your html is in $HTMLstring
$dom = new DOMDocument();
$dom->loadHTML($HTMLstring);

// May need to disable error checking if the HTML isn't fully valid
$dom->strictErrorChecking = FALSE;

// Get all the links
$links = $dom->getElementsByTagName("a");
foreach($links as $link) {
  $href = $link->getAttribute("href");

  // Find out if the link points to a domain other than yours
  // If your internal links are relative, you'll have to do something fancier to check
  // their destinations than this simple strpos()
  if (strpos("yourdomain.example.com", $href) == -1) {
     // Add the attribute
     $link->setAttribute("rel", "nofollow");
  }

// Save the html
$output = $dom->saveHTML;
Michael Berkowski
  • 267,341
  • 46
  • 444
  • 390
  • I always got Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, when I using $dom->loadHTML. Any suggestions? –  Jun 24 '11 at 20:32
  • Sounds like you're handing it invalid HTML, missing the semicolon on an entity like `&amp` somewhere. Either make sure the HTML is valid, or also try setting `$dom->strictErrorChecking = FALSE` so it overlooks more of those problems. – Michael Berkowski Jun 24 '11 at 20:59