Regular Expression to find URLs in block of Text (Javascript)

Question

I need a Javascript regular expression that scans a block of plain text and returns the text with the URLs as links.

This is what i have:

findLinks: function(s) {
          var hlink = /\s(ht|f)tp:\/\/([^ \,\;\:\!\)\(\"\'\\f\n\r\t\v])+/g;
          return (s.replace(hlink, function($0, $1, $2) {
              s = $0.substring(1, $0.length);
              while (s.length > 0 && s.charAt(s.length - 1) == '.') s = s.substring(0, s.length - 1);

              return ' ' + s + '';
          }));
      }

the problem is that it will only match http://www.google.com and NOT google.com/adsense

How could I accomplish both?

score 6 · Accepted Answer · edited Jan 15 '14 at 01:31

6

I use this a as reference all the time. This guy has 8 regex's you should know.

http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/

Here is what he uses to look for URL's

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

He also breaks down what each part does. Very useful for learning regex's and not just getting an answer that works for reasons you don't understand.

edited Jan 15 '14 at 01:31

Mark S

3,789
3
19
33

answered Nov 18 '09 at 14:57

MWill

195
1
9

1

His email regex is missing valid characters like the + sign in the part before the @ sign – CaffGeek Nov 18 '09 at 15:00
1

Email validation with regex is no trivial matter. I think this is more for learning than for using in hardcore production environments. However the URL pattern has worked well for me. Obviously it's going to need adjustments if your flavor of regex differs. – MWill Nov 18 '09 at 15:33
I love you! The link although, not 100% the answer, gave me a good alternative. – Theofanis Pantelides Nov 18 '09 at 17:04
The above link is dead, it is now available at: https://code.tutsplus.com/tutorials/8-regular-expressions-you-should-know--net-6149 – Harish ST Apr 13 '21 at 10:59

score 3 · Answer 2 · answered Nov 18 '09 at 15:58

This is a non-trivial task. To match any URI that is valid according to the relevant RFCs you need a monumentally complex regular expression, and even then that won't filter out URIs with invalid top-level domains (e.g. http://brussels.sprout/). So, you have to compromise. Determine what's important to you (examples: are false positives or false negatives more acceptable? Do you want to limit top-level domains to only those that currently exist? Do you allow non-Latin characters in matched URIs?) You should decide what you need you regular expression to do and design it accordingly rather than blindly copying and pasting an example from the web.

score 2 · Answer 3 · answered Nov 18 '09 at 14:56

2

You could make the protocol part optional:

/\s((ht|f)tp:\/\/)?([^ \,\;\:\!\)\(\"\'\\f\n\r\t\v])+/g

answered Nov 18 '09 at 14:56

FrustratedWithFormsDesigner

26,726
31
139
202

score 0 · Answer 4 · answered Nov 18 '09 at 14:57

0

Try this (works with your sample text)

\S+\.\S+

answered Nov 18 '09 at 14:57

Rubens Farias

57,174
8
131
162

Regular Expression to find URLs in block of Text (Javascript)

4 Answers4

Linked

Related