5

I have a situation where I have text which contains URL links. The links are in 2 forms

  1. www.stackoverflow.com
  2. <a href="http://www.stackoverflow.com">Stack over flow</a>

I am trying to create a simple function that uses regex that will wrap all links of type 1 with A HREF tag but leaving the other ones already wrapped a lone.

I have something like this but not successful.

function replaceURLWithHTMLLinks(text) {
    var exp = /(<(\s*)a(\s)*href.*>.*<\/(\s)*a(\s*)>)/ig;
    var matches = exp.exec(text);
    for(var i=0; i < matches.length; i++) {
        var line = matches[i];
        if(!exp.test(line)) {
            var exp2 = /(\b(?:(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[-A-Z0-9+&@#\/%=~_|$])|”(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[^"\r\n]+”?|’(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[^'\r\n]+’?)/ig;
            text = text.replace("http://","");
                text = text.replace(exp2, "<a href=http://$1>$1</a>");
        }
    }

    return text;
}

It's not working but hoping someone could fix it :)

EDIT

The solution that fixed it, with the help of @MikeM answer

function replaceLinksSO(text) {
    rex = /(<a href=")?(?:https?:\/\/)?(?:(?:www)[-A-Za-z0-9+&@#\/%?=~_|$!:,.;]+\.)+[-A-Za-z0-9+&@#\/%?=~_|$!:,.;]+/ig;   
    return text.replace(rex, function ( $0, $1 ) {
        if(/^https?:\/\/.+/i.test($0)) {
            return $1 ? $0: '<a href="'+$0+'">'+$0+'</a>';
        }
        else {
            return $1 ? $0: '<a href="http://'+$0+'">'+$0+'</a>';
        }
    });
}
george_h
  • 1,562
  • 2
  • 19
  • 37
  • possible duplicate of [How to replace plain URLs with links?](http://stackoverflow.com/questions/37684/how-to-replace-plain-urls-with-links) – David Feb 21 '13 at 09:52
  • @Dve not really a duplicate. I am trying to replace plain URLs with links only on the condition that the plain URL is not wrapped with a href tag. Because I am doing this on an HTML document. The other regex actually completely fails my test case. – george_h Feb 22 '13 at 10:08

2 Answers2

5

Without trying to analyze the complex regex and function above, here is an example implementation using a toy url matching pattern to illustrate a method of making such replacements

var str = ' www.stackoverflow.com  <a href="http://www.somesite.com">somesite</a> www.othersite.org '
    rex = /(<a href=")?(?:https?:\/\/)?(?:\w+\.)+\w+/g;    

str = str.replace( rex, function ( $0, $1 ) {
    return $1 ? $0 : '<a href="' + $0 + '">' + $0 + '</a>';
});

You can alter the url matching pattern and insert e.g. \s* as required.

MikeM
  • 13,156
  • 2
  • 34
  • 47
  • nice solution, this worked and replaced all my links (and ignored the already linked ones) on the string. I had to make it pre-pend http:// to the url though or the link would be broken. – george_h Feb 21 '13 at 13:30
1

Replacing patterns matching /(https?:\/\/)?((?:www|ftp)\.[-A-Za-z0-9+&@#\/%?=~_|$!:,.;]+?)[\r\n\s]+/ with <a href="$1$2">$1</a> would meet your requirement.

A better regex to match with will be ^(?!href="[^"\n\r\s]+?").*?(https?:\/\/)?((?:www|ftp)\.[-A-Za-z0-9+&@#\/%?=~_|$!:,.;]+)$

Naveed S
  • 5,106
  • 4
  • 34
  • 52