1

I have the following html string with three links:

var html = '
   <a href="http://www.example.com/help">Go to help page</a>
   <a href="http://blog.example.com">Go to blog page</a>
   <a href="https://google.com">Go google</a>
';

My domain name is example.com. As you can see from the code above there is two internal links and one external.

I need to write "magic" function that adds rel="nofollow" attribute to all external links (not internal ones). So I need to get the following result:

var html = '
   <a href="http://www.example.com/help">Go to help page</a>
   <a href="http://blog.example.com">Go to blog page</a>
   <a href="https://google.com" rel="nofollow">Go google</a>
';

I'm trying to write that function and this is I have at the time:

function addNoFollowsToExternal(html) {
  // List of allowed domains
  var whiteList = ['example.com', 'blog.example.com'];

  // Regular expression
  var str = '(<a\s*(?!.*\brel=)[^>]*)(href="/https?://)((?!(?:(?:www\.)?' + whiteList.join(',') + '))[^"]+)"((?!.*\brel=)[^>]*)(?:[^>]*)>',

  // execute regexp and return result
  return html.replace(new RegExp(str, 'igm'), '$1$2$3"$4 rel="nofollow">');
}

Unfortunately my regexp seems does't work. After executing addNoFollowsToExternal(html) rel="nofollow" don't added to external link with href="https://google.com"

Please help me with fixing my regular expression to resolve my task.

Erik
  • 14,060
  • 49
  • 132
  • 218

2 Answers2

6

There were some minor mistakes in your RegEx. Here is a corrected version:

function addNoFollowsToExternal(html){
    var whiteList = ['([^/]+\.)?example.com'];
    var str = '(<a\s*(?!.*\brel=)[^>]*)(href="https?://)((?!(?:' + whiteList.join('|') + '))[^"]+)"((?!.*\brel=)[^>]*)(?:[^>]*)>';

    return html.replace(new RegExp(str, 'igm'), '$1$2$3"$4 rel="nofollow">');
}
Anubis
  • 481
  • 4
  • 4
  • Thanks for the help. Is it possible to don't list all subdomains in `whiteList` array but just use `*.example.com` for example? – Erik Jun 16 '16 at 15:42
  • I modified the above function to your needs by removing the www. part from the regex and adding some tweaks to your whitelist – Anubis Jun 20 '16 at 08:18
  • Thanks. Is this robust solution? Is it possible to hack it somehow? I'm asking becasue planing to use it n production – Erik Jun 20 '16 at 08:43
  • How to modify your regexp to check port also, so to be possible check agains such whitelist `['([^/]+\.)?example.com:8080']` ? – Erik Jun 22 '16 at 08:09
0

you can also use function below

private function _txt2link($text){

         $regex = '/'
          . '(?<!\S)'
          . '(((ftp|https?)?:?)\/\/|www\.)'
          . '(\S+?)'
          . '(?=$|\s|[,]|\.\W|\.$)'
          . '/m';

        return preg_replace_callback($regex, function($match)
        {
            return '<a'
              . ' target="_blank"'
              . ' rel="nofollow"'
              . ' href="' . $match[0] . '">'
              . $match[0]
              . '</a><br/>';
        }, $text);
    }
Lonare
  • 3,581
  • 1
  • 41
  • 45