0

If I were to use preg_replace, what would be the corresponding regex in order to determine whether or not a string contains one or more <a> tags, and then add rel="nofollow" to it?

So it would take this:

Hi! What's up? <a href="http://test.com">Click here</a> to check out
<a href="http://apple.com">my</a> website. This is <b>also</b> a test.

And turn it into this:

Hi! What's up? <a href="http://test.com" rel="nofollow">Click here</a>
to check out <a href="http://apple.com" rel="nofollow">my</a> website. This is
<b>also</b> a test.
user2898075
  • 79
  • 1
  • 3
  • 10
  • 1
    Regex isn't the best option. http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php – JCOC611 Dec 08 '13 at 19:25

2 Answers2

1

Using DOM is a better approach over using regular expression here.

$html = <<<DATA
Hi! What's up? <a href="http://test.com">Click here</a> to check out
<a href="http://apple.com">my</a> website. This is <b>also</b> a test.
DATA;

$dom = new DOMDocument;
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$links = $xpath->query('//a');

foreach($links as $link) { 
   $link->setAttribute('rel', 'nofollow');
}

echo $dom->saveHTML();

Output

Hi! What's up? <a href="http://test.com" rel="nofollow">Click here</a> 
to check out <a href="http://apple.com" rel="nofollow">my</a> website. This is 
<b>also</b> a test.
hwnd
  • 69,796
  • 4
  • 95
  • 132
0

Here you go: just match the contents of the <a> tag and modify it.

$new_text = preg_replace('#<a\b((?![^>]*rel="nofollow")[^>]+)>#', '<a \1 rel="nofollow">', $your_starting_text);

The purpose of the negative lookahead ((?![^>]*rel="nofollow")) is to avoid double-adding the rel attribute. It says, don't match this <a> tag if it has rel="nofollow" already. Edited to fix double-adding glitch.

Demo:

$your_starting_text = 'Hi! What\'s up? <a href="http://test.com" rel="nofollow">Click here</a>
    to check out <a href="http://apple.com" rel="nofollow">my</a> website. This is
    <b>also</b> a test.';
$new_text = preg_replace('#<a\b((?![^>]*rel="nofollow")[^>]+)>#', '<a \1 rel="nofollow">', $your_starting_text);
echo htmlentities($new_text);

This outputs:

Hi! What's up? <a href="http://test.com" rel="nofollow">Click here</a> to check out <a href="http://apple.com" rel="nofollow">my</a> website. This is <b>also</b> a test.
elixenide
  • 44,308
  • 16
  • 74
  • 100