1

I need to add rel="nofollow" to all external links (not leading to my site or its subdomains).

I have done this in two steps, at first I add rel="nofollow" to all links (even internal links) using the following regular expression:

<a href="http([s]?)://(.*?)"

Then in the second step I eliminate rel="nofollow" for internal links (my site and its subdomains) using the following regular expression:

<a href="http([s]?)://(www\.|forum\.|blog\.)mysite.com(.*?)" rel="nofollow"

How can I do this only in one step? Is it possible?

Palec
  • 12,743
  • 8
  • 69
  • 138
Ahmad
  • 507
  • 1
  • 11
  • 22
  • How about using an html parser? – Antony Jul 13 '13 at 12:04
  • Better yet, how about using the search function? Possible duplicate of [RegEx expression to find a href links and add NoFollow to them](http://stackoverflow.com/q/2450985) or [How to add rel="nofollow" to links with preg\_replace()](http://stackoverflow.com/q/5037592) – mario Jul 13 '13 at 12:09

1 Answers1

2

The DOM way:

$doc = new DOMDocument();
@$doc -> loadHTMLFile($url); // url of the html file
$links = $doc->getElementsByTagName('a');

foreach($links as $link) {
    $href = $link->getAttribute('href');
    if (preg_match('~^https?://(?>[^/m]++|m++(?!ysite.com\b))*~', $href))
        $link->setAttribute('rel', 'nofollow');
}

$doc->saveHTMLFile($url);
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125