Regular expression for detecting hyperlinks

Question

I've got this regex pattern from WMD showdown.js file.

/<((https?|ftp|dict):[^'">\s]+)>/gi

and the code is:

text = text.replace(/<((https?|ftp|dict):[^'">\s]+)>/gi,"<a href=\"$1\">$1</a>");

But when I set text to http://www.google.com, it does not anchor it, it returns the original text value as is (http://www.google.com).

P.S: I've tested it with RegexPal and it does not match.

Take the <> out, it should work This one looks to be the best: `(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?` From http://regexlib.com/Search.aspx?k=URL&AspxAutoDetectCookieSupport=1 — Rob, Aug 22 '11 at 21:06
The last time someone answered a question about regex and HTML it drove them mad. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Chris J, Aug 22 '11 at 21:08
So you just want to take the whole url and put it in an anchor tag? In your example it should return `http://www.google.com`? — Ali, Aug 22 '11 at 21:12
There are many more protocols than the 3 listed, are those the only ones you want? And you are creating links, not anchors. — RobG, Aug 22 '11 at 23:48
@RobG, I have no problem with the protocols, my problem was with the format we write a regex pattern in javascript, and what was causing me confusion is the starting `/` and the ending `/gi` but now all is clear. — Ken D, Aug 23 '11 at 16:03

score 2 · Answer 1 · answered Aug 22 '11 at 21:09

2

Your code is searching for a url wrapped in <> like: <http://www.google.com>: RegexPal.

Just change it to /((https?|ftp|dict):[^'">\s]+)/gi if you don't want it to search for the <>: RegexPal

answered Aug 22 '11 at 21:09

Paul

139,544
27
275
264

Ali · Accepted Answer · 2011-08-22T21:34:49.127

0

As long as you know your url's start with http:// or https:// or whatever you can use:

/((https?|s?ftp|dict|www)(://)?)[A-Za-z0-9.\-]+)/gi

The expression will match till it encounters a character not allowed in the URL i.e. is not A-Za-z\.\-. It will not however detect anything of the form google.com or anything that comes after the domain name like parameters or sub directory paths etc. If that is your requirement that you can simply choose to terminate the terminating condition as you have above in your regex.

I know it seems pointless but it may be useful if you want the display name to be something abbreviated rather than the whole url in case of complex urls.

edited Aug 22 '11 at 21:34

answered Aug 22 '11 at 21:28

Ali

12,354
9
54
83

There are lots of other characters that are valid in a URL, pretty much anything other than a space is allowed. – RobG Aug 23 '11 at 00:37
Ignoring internationalized domain names... no, basically only `A-Za-z0-9\-` are allowed in domain names the - cannot be leading or the last character. LordCover (asker) is from Syria so it's really up to him I guess to decide what works. Either way, this regex is only useful for extracting the domain name which wasn't the requirement to start with. (Look at Valid characters http://en.wikipedia.org/wiki/Domain_name) – Ali Aug 23 '11 at 21:04

score 0 · Answer 3 · answered Aug 23 '11 at 00:34

You could use:

var re = /(http|https|ftp|dict)(:\/\/\S+?)(\.?\s|\.?$)/gi;

with:

 el.innerHTML = el.innerHTML.replace(re, '<a href=\'$1$2\'>$1$2<\/a>$3');

to also match URLs at the end of sentences.

But you need to be very careful with this technique, make sure the content of the element is more or less plain text and not complex markup. Regular expressions are not meant for, nor are they good at, processing or parsing HTML.

Regular expression for detecting hyperlinks

3 Answers3