8

I am trying to wrap any url that is in some text and turn it into a hyperlink... but I do not want to wrap a url that is already wrapped by a hyperlink.

For example:

<a href="http://twitter.com">Go To Twitter</a>
here is a url http://anotherurl.com

The following code:

function replaceURLWithHTMLLinks(text) {
  var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
  return text.replace(exp, "<a href='$1'>$1</a>");
}

Gives the following output:

<a href="<a href='http://twitter.com/twitter'>http://twitter.com/twitter</a>">@BIR</a>
<a href="http://anotherurl.com">http://anotherurl.com</a>

How can I modify the regex to exclude already hyperlinked urls?

Thanks

Answer:

The new method is:

function replaceURLWithHTMLLinks(text) {
  var exp = /(?:^|[^"'])((ftp|http|https|file):\/\/[\S]+(\b|$))/gi
  return text.replace(exp, " <a href='$1'>$1</a>");
}

The above code functions as required. I modified the regex from a link in the comments because it contained a bug where it would include the full stop, it now excludes any full stops that come after a full url.

Base33
  • 3,167
  • 2
  • 27
  • 31
  • 1
    You should not use a regex to parse html. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – John Sobolewski Aug 08 '12 at 11:52
  • 1
    Similar to [this question](http://stackoverflow.com/q/2177142/615754). And [this other question](http://stackoverflow.com/questions/8038910/regex-to-find-urls-not-in-tags?rq=1). Or [this one](http://stackoverflow.com/q/2641582/615754). – nnnnnn Aug 08 '12 at 11:56
  • Excellent! Thanks nnnnnn. I did search this morning but clearly my search phrases didn't match with anything useful. Thanks for sharing that! – Base33 Aug 08 '12 at 11:59
  • 1
    You're welcome. I found those by scanning down through the list of "Related" topics on the bottom right of this page... – nnnnnn Aug 08 '12 at 12:00
  • 1
    With this: `(?:^|[^"'])` I think you just need `[^"']` and remove the grouping. No need to detect if it's the start of a string as there won't be any any other characters there, surely? Also, this would still detect someone putting an href in the text of anchor tags? – marksyzm Jan 08 '16 at 23:21

1 Answers1

3

Since javascript doesn't seem to support negative look-behind, you will have to trick it by using a replace function. Capture the href (maybe you should also also consider src) :

function repl(text) {
  var exp = /((href|src)=["']|)(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
  return text.replace(exp, function() {
    return  arguments[1] ? 
            arguments[0] : 
            "<a href=\"" + arguments[3] + "\">" + arguments[3] + "</a>"
  });
}

See the demo

EDIT

A "better" version which will only replace links in actual text nodes:

function repl(node) {
  var exp = /(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/i;
  var nodes=node.childNodes;
  for (var i=0, m=nodes.length; i<m; i++){
    var n=nodes[i];
    if (n.nodeType==n.TEXT_NODE) {
      var g=n.textContent.match(exp);
      while(g) {
        var idx=n.textContent.indexOf(g[0]);
        var pre=n.textContent.substring(0,idx);
        var t=document.createTextNode(pre);
        var a=document.createElement("a");
        a.href=g[0];
        a.innerText=g[0];
        n.textContent = n.textContent.substring(idx+g[0].length);
        n.parentElement.insertBefore(t,n);
        n.parentElement.insertBefore(a,n);
        g=n.textContent.match(exp);
      }
    }
    else {
      repl(n);
    }
  }
}

var r=repl(document.getElementById("t"))

​ See the demo

Community
  • 1
  • 1
Julien Ch.
  • 1,231
  • 9
  • 16