Regex to make links clickable (in only 'a href' and not 'img src')

Question

I've been trying really hard to find a stable solution for a problem. I need to make all http/https links in a string as clickable links. But only those links that are in 'href' attribute of an 'a' tag, disregarding everything else.

I've been using this simple function to linkify text -

  function linkify(text) {
    var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
    return text.replace(exp, "<a target='_blank' href='$1'>$1</a>");
}

But the problem is that it converts links in 'src' attribute of any 'img' tag as clickable links too. Which I don't want. The string that I need to linkify can contain both 'a' as well as 'img' tags.

I even referred this link - How to replace plain URLs with links? and used this - https://github.com/cowboy/javascript-linkify, but still no luck.

Since I am using angular.js, I've also used the inbuilt 'linky' filter (https://docs.angularjs.org/api/ngSanitize/filter/linky) to linkify text but the problem still remains.

All of the above mentioned solutions linkify text in both 'a' and 'img' tags.

Looking for some help! Thanks.

The value of the `href` attribute can't be clickable, because attribute nodes are not displayed. — Oriol, Mar 15 '15 at 00:28

score 2 · Accepted Answer · edited May 23 '17 at 10:31

JavaScript lacks support of negative lookbehinds in regular expressions. Here's the simple workaround:

var content = '<a href="http://google.com">Google.com</a> and http://google.com';

var re = /((?:href|src)=")?(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;

content = content.replace(re, function (match, attr) {
    if (typeof attr != 'undefined') {
        return match;
    }
    return '<a target="_blank" href="' + match + '">' + match +'</a>';
});

But you should avoid parsing HTML with RegExp. Here's why.

score 0 · Answer 2 · answered Mar 15 '15 at 00:34

Your best bet would be to use an HTML/XML parser (Nokogiri for Ruby remains a stable favorite for me, if applicable) to identify and parse "innerHTML" tag contents, upon which you would run a regex like that. It's a maxim in programming that you're not supposed to use a regex to parse XML.

Regex to make links clickable (in only 'a href' and not 'img src')

2 Answers2