26

How can I setup my regex to test to see if a URL is contained in a block of text in javascript. I cant quite figure out the pattern to use to accomplish this

 var urlpattern = new RegExp( "(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?"

 var txtfield = $('#msg').val() /*this is a textarea*/

 if ( urlpattern.test(txtfield) ){
        //do something about it
 }

EDIT:

So the Pattern I have now works in regex testers for what I need it to do but chrome throws an error

  "Invalid regular expression: /(http|ftp|https)://[w-_]+(.[w-_]+)+([w-.,@?^=%&:/~+#]*[w-@?^=%&/~+#])?/: Range out of order in character class"

for the following code:

var urlexp = new RegExp( '(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?' );
guerda
  • 23,388
  • 27
  • 97
  • 146
BillPull
  • 6,853
  • 15
  • 60
  • 99
  • Why do you exclude FTPS? – PhiLho Dec 31 '14 at 11:30
  • I really only needed http/https so in my case I couldve left out ftp as well too – BillPull Jan 27 '15 at 00:56
  • This is essentially a duplicate of [How to replace plain URLs with links?](http://stackoverflow.com/questions/37684/how-to-replace-plain-urls-with-links), which explains why regular expressions are a bad idea for this kind of task. – Dan Dascalescu Oct 11 '16 at 03:57

8 Answers8

71

Though escaping the dash characters (which can have a special meaning as character range specifiers when inside a character class) should work, one other method for taking away their special meaning is putting them at the beginning or the end of the class definition.

In addition, \+ and \@ in a character class are indeed interpreted as + and @ respectively by the JavaScript engine; however, the escapes are not necessary and may confuse someone trying to interpret the regex visually.

I would recommend the following regex for your purposes:

(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?

this can be specified in JavaScript either by passing it into the RegExp constructor (like you did in your example):

var urlPattern = new RegExp("(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?")

or by directly specifying a regex literal, using the // quoting method:

var urlPattern = /(http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?/

The RegExp constructor is necessary if you accept a regex as a string (from user input or an AJAX call, for instance), and might be more readable (as it is in this case). I am fairly certain that the // quoting method is more efficient, and is at certain times more readable. Both work.

I tested your original and this modification using Chrome both on <JSFiddle> and on <RegexLib.com>, using the Client-Side regex engine (browser) and specifically selecting JavaScript. While the first one fails with the error you stated, my suggested modification succeeds. If I remove the h from the http in the source, it fails to match, as it should!

Edit

As noted by @noa in the comments, the expression above will not match local network (non-internet) servers or any other servers accessed with a single word (e.g. http://localhost/... or https://sharepoint-test-server/...). If matching this type of url is desired (which it may or may not be), the following might be more appropriate:

(http|ftp|https)://[\w-]+(\.[\w-]+)*([\w.,@?^=%&amp;:/~+#-]*[\w@?^=%&amp;/~+#-])?

#------changed----here-------------^

<End Edit>

Finally, an excellent resource that taught me 90% of what I know about regex is Regular-Expressions.info - I highly recommend it if you want to learn regex (both what it can do and what it can't)!

Code Jockey
  • 6,611
  • 6
  • 33
  • 45
  • regular-expressions-info is broken. Put "dot" instead of a dash in href. – esengineer Oct 10 '12 at 07:25
  • one more thing: the correct syntax would be `... = new RegExp(...)` instead of `... = new Regexp(...)`. Thanks anyway for the great answer! – zaphod1984 Nov 24 '12 at 07:05
  • 1
    This breaks on URLs with no dots in the host. For example, `http://localhost/foo/bar.txt`. To fix it, change `(\.[\w-]+)+` to `(\.[\w-]+)*`. – paulmelnikow Aug 10 '14 at 20:40
  • @noa 'breaks' is relative... but it would certainly not match on `http://localhost/` unless you made a change equivalent to your suggestion. The original question was not really a "I need to match all URLs", request but rather a "Why isn't this working - it's close, but it's not working..." query. While there are many other things issues with my expression that would cause false negatives and false positives for a 'correct' URL test, the problem was addressed to the asker's satisfaction. I'm glad you have what is needed for your case, too! – Code Jockey Aug 11 '14 at 16:44
  • 1
    The question is general, and this answer has gotten a lot of recognition. Someone (not the OP) used this code, and it caused a real bug in some code I was debugging… so breaks isn't *entirely* relative. It's worth making the answer as canonical as possible. – paulmelnikow Aug 29 '14 at 04:49
  • 1
    I highly recommend this as a supplemental resource: https://mathiasbynens.be/demo/url-regex – eremzeit Feb 12 '16 at 09:04
  • @CodeJockey the regular expression for localhost you need to change it from `[\w-]+(\.[\w-]*)+(` to `[\w-]+(\.[\w-]+)*(` . please test it. – Prafulla Kumar Sahu Jan 27 '17 at 14:04
  • @prafulla - driving at the moment, and looking on my phone, but I think you might be correct. I thought you were correcting my comment above (which is, I'm pretty sure, correct), but then realized you might be talking about the edit to my post... wow... that was silly and stuck for quite a while. Assuming I look later and still think you're correct in correcting me, I thank you for pointing that out -_-. I'll correct it if appropriate very soon! – Code Jockey Jan 27 '17 at 16:52
  • I tested it and thought to let you know. thank you for your answer, it is helpful at least in my condition, so upvoting it. – Prafulla Kumar Sahu Jan 27 '17 at 18:17
  • This validates as `true`: `"http://_// ..."` (My human eye validator does not approve) – vsync Sep 09 '20 at 10:37
4

Complete Multi URL Pattern.

UPDATED: Nov. 2020, April & June 2021 (Thanks commenters)

Matches all URI or URL in a string! Also extracts the protocol, domain, path, query and hash. ([a-z0-9-]+\:\/+)([^\/\s]+)([a-z0-9\-@\^=%&;\/~\+]*)[\?]?([^ \#\r\n]*)#?([^ \#\r\n]*)

https://regex101.com/r/jO8bC4/56

Example JS code with output - every URL is turned into a 5-part array of its 'parts' (protocol, host, path, query, and hash)

var re = /([a-z0-9-]+\:\/+)([^\/\s]+)([a-z0-9\-@\^=%&;\/~\+]*)[\?]?([^ \#\r\n]*)#?([^ \#\r\n]*)/mig;
var str = 'Bob: Hey there, have you checked https://www.facebook.com ?\n(ignore) https://github.com/justsml?tab=activity#top (ignore this too)';
var m;

while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
    console.log(m);
}

Will give you the following:

["https://www.facebook.com",
  "https://",
  "www.facebook.com",
  "",
  "",
  ""
]

["https://github.com/justsml?tab=activity#top",
  "https://",
  "github.com",
  "/justsml",
  "tab=activity",
  "top"
]
Dan Levy
  • 1,214
  • 11
  • 14
  • 1
    this is a super clever way to do it +1 – David Jan 08 '16 at 03:46
  • Your regex is not differentiating between a block of text and URL. Check [here](https://regex101.com/r/jO8bC4/2) –  Jan 21 '16 at 20:10
  • Updated my answer - includes @noob 's suggested string prepended to my example code (so it pulls all url-like strings very reliably - even if there is a colon-prefixed string. uses explicit matching on slashes to delineate the protocol). Also works with `smb:///winbox/dfs/` or `ipp://printer` https://regex101.com/r/jO8bC4/5 – Dan Levy Jan 28 '16 at 23:16
  • **BAM** `"a a:// . "` returns `true` with this Regex :/ – vsync Sep 09 '20 at 10:28
  • Hey @vsync - thanks, it now requires 1 or more chars for the domain! – Dan Levy Apr 20 '21 at 00:58
  • If you're looking to capture _any_ protocol/scheme (not just a finite list of them) - you'll want to consider making the first character class support digits, periods, and hyphens, RE: https://en.wikipedia.org/wiki/List_of_URI_schemes -- like `chrome-extension://`, `ms-help://`, `iris.beep://`, `s3://`, and `pkcs11://` – Code Jockey Jun 03 '21 at 13:42
  • Good spotting @CodeJockey! I didn't know that the `.` was allowed. Thanks for the info. – Dan Levy Jun 04 '21 at 01:17
3

You have to escape the backslash when you are using new RegExp.

Also you can put the dash - at the end of character class to avoid escaping it.

&amp; inside a character class means & or a or m or p or ; , you just need to put & and ; , a, m and p are already match by \w.

So, your regex becomes:

var urlexp = new RegExp( '(http|ftp|https)://[\\w-]+(\\.[\\w-]+)+([\\w-.,@?^=%&:/~+#-]*[\\w@?^=%&;/~+#-])?' );
Toto
  • 89,455
  • 62
  • 89
  • 125
1

Try this general regex for many URL format

/(([A-Za-z]{3,9})://)?([-;:&=\+\$,\w]+@{1})?(([-A-Za-z0-9]+\.)+[A-Za-z]{2,3})(:\d+)?((/[-\+~%/\.\w]+)?/?([&?][-\+=&;%@\.\w]+)?(#[\w]+)?)?/g
Khadijah J Shtayat
  • 190
  • 1
  • 3
  • 10
1

try (http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?

Vinit
  • 1,815
  • 17
  • 38
1

I've cleaned up your regex:

var urlexp = new RegExp('(http|ftp|https)://[a-z0-9\-_]+(\.[a-z0-9\-_]+)+([a-z0-9\-\.,@\?^=%&;:/~\+#]*[a-z0-9\-@\?^=%&;/~\+#])?', 'i');

Tested and works just fine ;)

matthiasmullie
  • 2,063
  • 15
  • 17
0

try this worked for me

/^((ftp|http[s]?):\/\/)?(www\.)([a-z0-9]+)\.[a-z]{2,5}(\.[a-z]{2})?$/

that is so simple and understandable

Tolga İskender
  • 158
  • 2
  • 14
0

The trouble is that the "-" in the character class (the brackets) is being parsed as a range: [a-z] means "any character between a and z." As Vini-T suggested, you need to escape the "-" characters in the character classes, using a backslash.

PotatoEngineer
  • 1,572
  • 3
  • 20
  • 26