What is a good regular expression to match a URL?

Question

Currently I have an input box which will detect the URL and parse the data.

So right now, I am using:

var urlR = /^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)
           (?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;
var url= content.match(urlR);

The problem is, when I enter a URL like www.google.com, its not working. when I entered http://www.google.com, it is working.

I am not very fluent in regular expressions. Can anyone help me?

John Gruber's [Liberal, Accurate Regex Pattern for Matching URLs](http://daringfireball.net/2010/07/improved_regex_for_matching_urls) is also good. See [this SO question](http://stackoverflow.com/questions/6927719/url-regex-does-not-work-in-javascript) for how to modify it to work in Javascript. — paleozogt, Jun 20 '12 at 19:03
"/(http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?/" — Mohammed Akdim, May 11 '17 at 16:52
This is marked for duplicate but this question is asking for js and the other question is not asking for a js solution — Huangism, May 25 '17 at 14:50
@jose920405 I love your your extended regexp because it's simple. I've just tested it, it needed a little adjustment to forbid the `"` in the URL, i.e. : `(www|http:|https:)+[^\s"]+[\w]` — SebMa, Aug 22 '19 at 21:25
I've made a little adjustment to the regex by @MukulJain to validate for a full URL with a TLD before returning true, as the other expression validated partial URLs: `^(https?:\/\/)\S*\.(\S){2,}` — MrLewk, Oct 05 '19 at 08:25
or you can refer this regex tutorial - https://www.youtube.com/watch?v=TiqXWDyywog — Krishnraj Rana, Jul 05 '20 at 20:43
`/(?:(?:https?)://)(?:localhost|(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))\\.?)(?::\\d{2,5})?(?:[/?#]\\S*)?$/i` <-can be used with re2, this lib https://github.com/neosiae/is-valid-http-url — Andersonfrfilho, Feb 04 '21 at 20:49
You're mixing protocols (https) and domain names (www). Besides, you're not allow ftp protocol for instance. Endly, there are many cases not allowed (http authentication, not default port, etc). Here is a more generic proposal: https://regex101.com/r/gCXX9j/1 It allows subparts extraction and includes internal domain names. ^(?:(http?|s?ftp):\/\/|file:\/\/\/)?(([\P{Cc}]+):([\P{Cc}]+)@)?([a-zA-Z0-9][a-zA-Z0-9.-]*)(:[0-9]{1,5})?($|\/[\w\-\._~:\/?[\]@!\$&'\*\+,;=.]+(\#[\w]*)?$) — Damien C, Dec 27 '21 at 17:37

Daveo · Answer 1 · 2019-07-01T02:00:19.633

1018

Regex if you want to ensure URL starts with HTTP/HTTPS:

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

If you do not require HTTP protocol:

[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)

To try this out see http://regexr.com?37i6s, or for a version which is less restrictive http://regexr.com/3e6m0.

Example JavaScript implementation:

var expression = /[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)?/gi;
var regex = new RegExp(expression);
var t = 'www.google.com';

if (t.match(regex)) {
  alert("Successful match");
} else {
  alert("No match");
}

edited Jul 01 '19 at 02:00

answered Sep 28 '10 at 03:15

Daveo

19,018
10
48
71

25

For got to mention use this site http://gskinner.com/RegExr/ to test Regex and view common samples – Daveo Sep 28 '10 at 03:16
var urlRegex = /(https?://)?(www\.)?([a-zA-Z0-9_%]*)\b\.[a-z]{2,4}(\.[a-z]{2})?((/[a-zA-Z0-9_%]*)+)?(\.[a-z]*)?$/; Is it like this? not working either. – bigbob Sep 28 '10 at 03:23
2

Look here http://regexr.com?2s81g you can see in the sample text that matches the URL regex are highlighted in blue – Daveo Sep 28 '10 at 03:42
how do i use in javascritp context? when i add this in website.. javascript not working anymore.. i think they are error – bigbob Sep 28 '10 at 03:56
I have updated my orginal answer to show full JavaScript example of the regex. I also changed the regex slightly – Daveo Sep 28 '10 at 04:26
In your example the query string of http://RegExr.com?2rjl6 is not caught. – dmnc Jan 24 '12 at 14:42
@dmnc yes it is. I copied and pasted the code into firebug and changed www.google.com to regexr.com?2rji6 and it alerted sucessful match – Daveo Jan 24 '12 at 22:17
@Daveo. Now I look back at that gskinner.com/RegExr link I used to test (you can see the results here http://regexr.com?2s715), it's built in Flash. Maybe it isn't using the javascript regex engine at all... – dmnc Jan 26 '12 at 00:26
ok this one will match `[-a-zA-Z0-9@:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)?` – Daveo Jan 26 '12 at 11:02
10

This still matches URLs without a valid TLD, ie: "http://foo/file.html" – Jesse Fulton Apr 08 '12 at 17:43
12

regex.test('//.com') => true – Derek Prior Jul 05 '12 at 18:53
Don't forget about port numbers: "http://example.com:80/test" /[-a-zA-Z0-9@:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}(\:[0-9]+)?\b(\/[-a-zA-Z0-9@:%_\+.~#?&//=]*)?/gi – elundmark Mar 12 '13 at 10:11
This didn't work with query params, e.g. http://www.booking.com/hotel/id/champlung-sari-ubud.en-gb.html?aid=356992;label=gog235jc-hotel-en-id-okawati-nobrand-ch-com;sid=32c5d07ac3a021d4c4fd640290313345;dcid=1 – nomis May 15 '13 at 16:09
1

Doesn't work for url with non-English characters: `"http://正妹.香港/‎" false` – Derek 朕會功夫 Mar 21 '14 at 07:01
this does not match URL containing ``,`` such as ``http://res.cloudinary.com/hrscywv4p/image/upload/c_fill,g_faces:center,h_128,w_128/yflwk7vffgwyyenftkr7.png``. Minor edit to fix that ``[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=,]*)`` – Falcon Feb 04 '15 at 13:59
`http://www.c:ool.com.au` shouldn't be a valid url, there must be at most one `:` after the `//` and it should be followed by digits and then a `/` or `?` or the end of the URL – undefined Feb 23 '15 at 22:32
this also match http(s)?://www.blablabla.something ((https?:\/\/)|(www\.)|(https?:\/\/www\.))([a-zA-Z0-9-_]{2,256})\.([a-z]{2,4})\b([-a-zA-Z0-9@:%_\+.~#?&//=]*) – Andrea_86 Jul 23 '15 at 14:05
22

question - why the double slash inside the last character class? in this portion of the regex [-a-zA-Z0-9@:%_\+.~#?&//=] there is a double slash, which doesn't seem necessary to me? You are placing twice the same character within the character class, and if you intended to escape the normal slash, this will be futile since escaping is performed with backslash?... – Daniel Cairol Aug 11 '15 at 17:17
3

Here is the refined version of @Daveo's Regex that worked the best for me: https://regex101.com/r/hU9aV3/2 – Rahul Desai Oct 09 '15 at 00:48
1

This is not working for 'http://localhost:60001/#/tab/dash'. /[-a-zA-Z0-9@:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~#?&//=]*)?/gi.test('http://localhost:60001/#/tab/dash'); false – Jeff Tian Mar 31 '16 at 05:46
This is wrong. gtlds can be longer than 6 char and comprise non ascii characters. – Antzi Apr 05 '16 at 14:44
This doesn't match twitter links that are just t.co Need to change the {2,256} to {1,256} – mkaj Oct 08 '16 at 06:33
1

this returns invalid `https://en.wikipedia.org/wiki/Harry_Potter_(film_series)` – Amin Jafari Nov 26 '16 at 07:21
If you want the Regex to recognise Capital or Mixed case letters, you should change .[a-z]{2,4} to .[a-zA-Z]{2,4} You might also want to consider changing matching number range from {2,4} as domain names these days are getting longer and longer (e.g. .consulting, .marketing, .shopping etc) – Francis Feb 04 '17 at 10:32
8

doesn't work if url has spaces. t = 'www.google.com withspace' t.match(regex) // returns true – Imamudin Naseem Feb 13 '17 at 05:21
1

This did not found my website: `http://www.نبی.com/` :-D – Nabi K.A.Z. Apr 16 '17 at 13:22
3

This doesn't work for `http://12.23.12.23:8080/example` though it is valid – Sivasankar Jun 14 '17 at 06:57
Matches `...and` – DomeTune Jul 25 '17 at 09:17
1

This does not detect `localhost` as a URL. – technophyle Jan 12 '18 at 16:38
If you're using python, be sure to use raw strings or escape the "\b", or else this won't work: https://stackoverflow.com/q/3995034/3412775. – Tomty Mar 22 '18 at 05:35
It considers this invalid url `www.baidu.com:443.us/path/to/the/file` as valid – VeryLazyBoy Apr 30 '18 at 02:12
@AminJafari this should match your example: `https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&\/=()]*)` – Aurimas Stands with Ukraine Apr 30 '18 at 06:30
Remove the two chars '\/' after the '\b(' to fix the bug of not matching ?queries=hi. Also, adjust '{2,4}' to be '{1,4}' since there are some single letter domains. – jahooma May 21 '18 at 09:38
2

I'm confused about `//` in the last group. Should it be `\/`? – Vladimir Vlasov Dec 25 '18 at 05:46
It should not be used in production. It also says `true` for `\\\\\\||||@@@@https://www.google.com` – Stack Overflow Feb 14 '19 at 10:57
good answer but you missed the star character \\* so the full valid string is: https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=\\*]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=\\*]*) have a look here https://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid – sed Mar 13 '19 at 13:21
this does work with `http://localhost:3000` – Sang Mar 29 '19 at 07:46
How can this be a valid URL `http://google.com......` ? – Saurabh Sharma May 24 '19 at 09:14
@Daveo Also, How can this be a valid URL `https://www.google_.com`? – Saurabh Sharma May 24 '19 at 11:49
1

This regex will match all url with more than one dots, like ```www...google...com```, you need to check for the \. to be equal to only one. Try this regex as it's working better with the dots. ```^https?:\/\/(www\.)?([-a-zA-Z0-9@:%_\+~#=]{2,256}(\.){1})+[a-z]{2,6}\b(\/[-a-zA-Z0-9@:%._?&\/+~#=]{2,256}|:[0-9]{2,})*```. someone know how to change this and avoid ```www.aa``` ?? – 1020rpz May 31 '19 at 15:49
I don't expect this regex to handle all extreme cases like pointed out above but here is a simple use case that I use all the time in real life `http://192.168.1.19` or `http://192.168.1.19:5000` that should be valid but fails. – Justin A Jun 29 '19 at 07:27
@JustinA updated to work - see here https://regexr.com/3e6m0 – Daveo Jul 01 '19 at 02:02
3

Diego Perini made a very good regex that covers almost all possible cases, you might want to check it out here: https://mathiasbynens.be/demo/url-regex – Eyad Mohammed Osama Jul 22 '19 at 00:00
https://themadhurgupta.blogspot.com/2019/09/regular-expression.html – MADHUR GUPTA Sep 12 '19 at 09:37
2

doesnt work for new urls like https://mysite.restaurant because of 6 letter ending limit. It should be 18 i think, because of e.,g. `.northwesternmutual`. Got the list from `https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains` – ya_dimon Jun 25 '20 at 11:59
but this matches *msa...com* – Serob_b Jul 15 '20 at 12:29
Doesn't work for utf8 characters http://食狮.com.cn – Shardj Aug 07 '20 at 11:25
1

`http://.google.com` is also matched with this one!! is that ok? – kakabali Aug 30 '20 at 04:20
Also matches on `http://www.google..com` – Erion S Oct 11 '20 at 21:26
Don't cover this type of urls: https://www.chartmill.com/analyze.php?utm_source=stocktwits&utm_medium=FA&utm_content=HEALTH&utm_campaign=social_tracking#/FSLR?r=fa&key=b431cf47-ec7e-4226-b864-511b6081b3be – Nikolay Nikiforchuk Dec 12 '20 at 20:53
1

The first regexp is invalid !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! – towry Dec 17 '20 at 10:49
1

this matches the emails as well – Reza May 20 '21 at 00:07
By your regex, `http://example.com:806666` is valid – huang May 28 '21 at 03:14
It doesn't match "http://www.example.com/hello/world.do?key=python", which is a valid url. – Jinwook Kim Aug 05 '21 at 05:21
It doesn't match URL with special characters : é à ï ... – Sami Oct 22 '21 at 13:15
It misses some special characters like asterisk (*). See https://stackoverflow.com/questions/4644092 – displayName Dec 06 '21 at 14:07
Please note that it matches prefixed and suffixed URLs, e.g. **prefix**https://example.com**suffix**. To avoid that, the expression should begin with `^` and end with `$`: `^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$`. – Kathandrax Jan 23 '22 at 19:00
31.12.2023 - is not valid url :D – Oleg Apr 11 '23 at 12:25
does not work for https://blabla.sharepoint.com/:w:/r/sites/bla-project-delivery/_layouts/15/Doc.aspx?sourcedoc=%7B167E7E9A-F55E-4385-BA20-FCD2BBF2BCC6%7D&file=TPGT-bla%20-%20Production%20MPN%20HLD%20Rev%200.52.docx&action=default&mobileredirect=true – Suresh Kumar Aug 11 '23 at 10:43

foufos · Answer 2 · 2023-03-27T15:22:55.010

397

(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})

Will match the following cases

http://www.foufos.gr
https://www.foufos.gr
http://foufos.gr
http://www.foufos.gr/kino
http://werer.gr
www.foufos.gr
www.mp3.com
www.t.co
http://t.co
http://www.t.co
https://www.t.co
www.aa.com
http://aa.com
http://www.aa.com
https://www.aa.com
badurlnotvalid://www.google.com - captured url www.google.com
htpp://www.google.com - captured url www.google.com

Will NOT match the following

www.foufos
www.foufos-.gr
www.-foufos.gr
foufos.gr
http://www.foufos
http://foufos
www.mp3#.com

var expression = /(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})/gi;
var regex = new RegExp(expression);

var check = [
  'http://www.foufos.gr',
  'https://www.foufos.gr',
  'http://foufos.gr',
  'http://www.foufos.gr/kino',
  'http://werer.gr',
  'www.foufos.gr',
  'www.mp3.com',
  'www.t.co',
  'http://t.co',
  'http://www.t.co',
  'https://www.t.co',
  'www.aa.com',
  'http://aa.com',
  'http://www.aa.com',
  'https://www.aa.com',
  'badurlnotvalid://www.google.com',
  'htpp://www.google.com',
  'www.foufos',
  'www.foufos-.gr',
  'www.-foufos.gr',
  'foufos.gr',
  'http://www.foufos',
  'http://foufos',
  'www.mp3#.com'
];

check.forEach(function(entry) {
  let match = entry.match(regex);
  if (match) {
    $("#output").append( "<div style='float:left'>Success: " + entry + "</div><div style='float:right'>Captured url: " + match + "</div><br>" );
  } else {
    $("#output").append( "<div style='float:left'>Fail: " + entry + "</div><br>" );
  }
});

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div id="output"></div>

Check it in rubular - latest version

~~Check it in rubular - old version~~

edited Mar 27 '23 at 15:22

answered Jul 21 '13 at 15:21

foufos

4,135
1
12
8

1

I changed your expression a bit so it will work in all cases i need, including uri with http:// or http://www "/([^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})/gi" – Ismael Jan 14 '15 at 11:08
Sorry, in the comment h-t-t-p-:-/-/ was sanitized! – Ismael Jan 14 '15 at 11:08
17

This regex is no longer valid as new custom gTLDs can have URLs like https://calendar.google/ – Vinicius Tavares Aug 17 '15 at 15:04
what if we have to remove web page and make it only for websites . I mean if we remove `http://www.foufos.gr/kino` from matched cases. what would be the change in regex ? – Ajeet Lakhani Dec 28 '15 at 20:21
@ajeetlakhani you can just add the / to the last group of not allowed characters `(https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s\/]{2,}|www\.[^\s]+\.[^\s\/]{2,})` – foufos Dec 29 '15 at 21:35
10

but it will match `http://www.foufos` and will not match `regex.com` – Qiang Mar 14 '16 at 19:28
26

Who cares about the special www subdomain anymore? Nobody! – Lothar Apr 17 '16 at 20:24
1

Another problem with this regular expression is `?!`, negative look-aheads are not supported by some libraries, particularly Go regex library because then O(n) time complexity is no longer guaranteed. – Marek Jul 10 '16 at 04:08
it is giving true for `www.mp3#.com` which is I think wrong – Mrugesh Tank Mar 25 '17 at 11:42
@mrugesh-tank I have edited the regex – foufos Mar 30 '17 at 22:12
This will unfortunately also catch "www.example.com.", which I think ending with a "." is also invalid. – Ruben Martinez Jr. May 01 '17 at 20:54
@ruben-martinez-jr you could add ".," or any other punctuation you like in the [^\s] part but you would miss a url like "www.example.com/test.html" – foufos May 03 '17 at 10:26
Does it match subdomains? Like `http://docs.google.com`? – Augustin Riedinger Nov 22 '17 at 10:25
3

@augustin-riedinger It will match if the http or https is prepended so `http://docs.google.com` will match but `docs.google.com` will not match – foufos Nov 23 '17 at 10:13
Hi and thanks for this regex! Question: I tried to add the possibility, to use intranet links such as e.g. `http://intranet/index.html` and also `mailto` e.g. `mailto:sample@sample.com`. Tried out this regex: `/^((http(s)?)|(ftp(s)?):\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})|(mailto:){1}([\w\.]+)\@{1}[\w]+\.[\w]{2,})\s$/gm` but it did not work for the two examples below. What's wrong? – webta.st.ic Jun 25 '19 at 11:42
@webta.st.ic You could add `|https?:\/\/[^\s]+|mailto:[^\s]+` just before the closing bracket and capture these two cases. What it does is that it states if you have a string starting with `http://` or `https://` or `mailto:` then capture it. This also captures the two cases not captured by the original expression (`http://www.foufos`, `http://foufos`) – foufos Jul 01 '19 at 07:12
Hi @foufos I made working a RegEx for my required cases: `/^(((http(s)?|(ftp(s)?)):\/\/)(www\.)?([a-zA-Z0-9][a-zA-Z0-9\.\/-]+[a-zA-Z0-9]\.[^\s]{2,})+(\:[0-9]{5})?|(mailto:){1}([\w\.]+)\@{1}[\w]+\.[\w]{2,})\s$/gm;` Here also the regex test: https://regex101.com/r/XFvQjr/4 Thanks! – webta.st.ic Jul 01 '19 at 07:25
1

Your regex doesn't match `https://wwwsomething.com` – Monsieur Pierre Doune Oct 26 '19 at 17:32
@undefined the regex is designed to match urls either starting with `www.` (dot included) or not starting with `www` therefore I am not changing the regex, but in order to achieve it you could change the `www\.` parts of the regex to `www\.?` – foufos Nov 01 '19 at 14:55
better than the above one for sure – kakabali Aug 30 '20 at 04:22
1

This matches on `www.google..com` – Erion S Oct 11 '20 at 21:25
@erion-s it is a trade-off since you might want to capture `www.google.com/test.html` – foufos Oct 12 '20 at 08:59
1

@foufos how come that's a trade-off? Surely it is possible to write an expression that accepts `www.google.com/test.html` but does not accept `www.google..com`? – Rudey Dec 02 '20 at 19:39
1

This matches " characters, such as `http://www.google.com/"asdf"`. I believe " is not a legal URL character – John Jan 12 '21 at 18:51
Note: This answer supports `https://localhost`, not found in the accepted answer – Spectric Jun 06 '21 at 23:59
1

This regex validates "htps://www.google.com " when it shouldn't – Augustine Calvino Oct 19 '21 at 05:34
@user2410203 it matches the `www.google.com` part of the string so I am not sure that it makes it wrong. You could add a whitespace character match in front of the `www` (`|\swww`) so that it will expect a space before the www – foufos Oct 20 '21 at 07:26
This is also matching https://www.aa.com" (Note the double quote in the end). The url should match but double quotes not be part of the match – Kaunteya Nov 28 '21 at 10:05
1

upgrade this regex: `/^(https?:\/\/)?(?:www\.|(?!www\.))(([a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]|[a-zA-Z0-9]+)\.)+(([a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]|[a-zA-Z0-9]){2,})\/?$/i` link to test: https://rubular.com/r/L5svOYSnxfweQp – dostapn Dec 02 '21 at 13:44
This is great, it even works with Google Maps links. But is it possible to ignore tailing characters? If for example I have a link in between parenthesis `(https://google.com)` it's matching `https://google.com)` which is not valid for me. – BrunoLM Jan 05 '22 at 13:02
This also matches invalid links such as htpp://www.google.com. – BrightIntelDusk Feb 27 '23 at 17:28
@BrightIntelDusk it actually mathces the www.google.com and not the htpp:// – foufos Mar 01 '23 at 07:24
@foufos : Actually, I tested it and it matches on `Success: htpp://www.google.com`. I had unit tests running in NodeJS that caught this one. In addition, I ran it in a Chrome Snippet and verified that this regexp matches on htpp://www.google.com. Test your assumptions. – BrightIntelDusk Mar 24 '23 at 23:11
Here's the codepen to prove that this RegExp matches on htpp://www.google.com https://codepen.io/QLuminary/pen/qBMLJzj – BrightIntelDusk Mar 24 '23 at 23:20
It even matches the entire URL for `badurlnotvalid://www.google.com`. – BrightIntelDusk Mar 24 '23 at 23:25
1

@BrightIntelDusk I am sorry for the confusion, I just updated the code snippet to show the captured match. In your cases the actual url was captured, ignoring the invalid htpp part – foufos Mar 27 '23 at 15:25
g.cn still valid, but not for your regexp – Oleg Apr 11 '23 at 12:32
@RubenMartinezJr. [https://example.com./](https://example.com./) is a perfectly valid URL, using a fully-qualified domain name. (Nowadays, most systems don't really resolve partially-qualified domain names, so unqualified domain names are almost the same thing as fully-qualified ones.) – wizzwizz4 Apr 15 '23 at 17:45

score 65 · Answer 3 · edited Jul 07 '22 at 16:48

These are the droids you're looking for. This is taken from validator.js which is the library you should really use to do this. But if you want to roll your own, who am I to stop you? If you want pure regex then you can just take out the length check. I think it's a good idea to test the length of the URL though if you really want to determine compliance with the spec.

 function isURL(str) {
     var urlRegex = '^(?!mailto:)(?:(?:http|https|ftp)://)(?:\\S+(?::\\S*)?@)?(?:(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[0-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))|localhost)(?::\\d{2,5})?(?:(/|\\?|#)[^\\s]*)?$';
     var url = new RegExp(urlRegex, 'i');
     return str.length < 2083 && url.test(str);
}

Test:

function isURL(str) {
         var urlRegex = '^(?!mailto:)(?:(?:http|https|ftp)://)(?:\\S+(?::\\S*)?@)?(?:(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[0-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))|localhost)(?::\\d{2,5})?(?:(/|\\?|#)[^\\s]*)?$';
         var url = new RegExp(urlRegex, 'i');
         return str.length < 2083 && url.test(str);
    }
var check = [
  'http://www.foufos.gr',
  'https://www.foufos.gr',
  'http://foufos.gr',
  'http://www.foufos.gr/kino',
  'http://werer.gr',
  'www.foufos.gr',
  'www.mp3.com',
  'www.t.co',
  'http://t.co',
  'http://www.t.co',
  'https://www.t.co',
  'www.aa.com',
  'http://aa.com',
  'http://www.aa.com',
  'https://www.aa.com',
  'www.foufos',
  'www.foufos-.gr',
  'www.-foufos.gr',
  'foufos.gr',
  'http://www.foufos',
  'http://foufos',
  'www.mp3#.com'
];

for (let index = 0; index < check.length; index++) {
var url=check[index]
  if  (isURL(check[index]))
    console.log(`${url}         ✔`);
else{
  console.log(`${url}          ❌`);
}
  
}

Result

Worth mentioning **this can crash your browser**. See example: http://jsfiddle.net/Lrnambtt/9/ — Ruben Martinez Jr., May 01 '17 at 20:44
Just a bit more info on the comment by @RubenMartinezJr. - it *does* max out the CPU on Chrome and Firefox (Mac OS), but interestingly *does not* max out the CPU on Safari. — rinogo, Nov 09 '17 at 14:46
Great! But the function is returning false for a Wikipedia URL: `https://en.m.wikipedia.org/wiki/Euler–Lagrange_equation` — Awolad Hossain, Jun 16 '22 at 05:29

score 47 · Answer 4 · edited Jul 07 '22 at 16:41

Another possible solution, above solution failed for me in parsing query string params.

var regex = new RegExp("^(http[s]?:\\/\\/(www\\.)?|ftp:\\/\\/(www\\.)?|www\\.){1}([0-9A-Za-z-\\.@:%_\+~#=]+)+((\\.[a-zA-Z]{2,3})+)(/(.)*)?(\\?(.)*)?");

if(regex.test("http://google.com")){
  alert("Successful match");
}else{
  alert("No match");
}

In this solution please feel free to modify [-0-9A-Za-z\.@:%_\+~#=, to match the domain/sub domain name. In this solution query string parameters are also taken care.

If you are not using RegEx, then from the expression replace \\ by \.

Hope this helps.

Test:-

function IsUrl(url){
    var regex = new RegExp("^(http[s]?:\\/\\/(www\\.)?|ftp:\\/\\/(www\\.)?|www\\.){1}([0-9A-Za-z-\\.@:%_\+~#=]+)+((\\.[a-zA-Z]{2,3})+)(/(.)*)?(\\?(.)*)?");

if(regex.test(url)){
  console.log(`${url}         ✔`);
}else{
  console.log(`${url}          ❌`);
}}
var check = [
  'http://www.foufos.gr',
  'https://www.foufos.gr',
  'http://foufos.gr',
  'http://www.foufos.gr/kino',
  'http://werer.gr',
  'www.foufos.gr',
  'www.mp3.com',
  'www.t.co',
  'http://t.co',
  'http://www.t.co',
  'https://www.t.co',
  'www.aa.com',
  'http://aa.com',
  'http://www.aa.com',
  'https://www.aa.com',
  'www.foufos',
  'www.foufos-.gr',
  'www.-foufos.gr',
  'foufos.gr',
  'http://www.foufos',
  'http://foufos',
  'www.mp3#.com'
];
for (let index = 0; index < check.length; index++) {
    IsUrl(check[index])
}

Result

`var regex = /^(http[s]?:\/\/(www\.)?|ftp:\/\/(www\.)?|www\.){1}([0-9A-Za-z-\.@:%_\+~#=]+)+((\.[a-zA-Z]{2,3})+)(\/(.)*)?(\?(.)*)?/g;` works for me — Moreno, Feb 12 '13 at 18:57
nice solution but fails for http://foo.co.uk... must be set to this var regex = new RegExp("^(http[s]?:\\/\\/(www\\.)?|ftp:\\/\\/(www\\.)?|(www\\.)?){1}([0-9A-Za-z-\\.@:%_\+~#=]+)+((\\.[a-zA-Z]{2,3})+)(/(.)*)?(\\?(.)*)?"); Thanks Amar. — Tony, Apr 23 '13 at 22:30
Fails for something like: `https://www.elh` or `http://www.elh`. Although @Tony solution passed this case, it fails with `www.elh` — Elharony, May 31 '20 at 06:01
If I test `Hi there, https://www.atrable.com/#motivation is the motivation of making my app` , this Regex includes `is the motivation of making my app` as a part of url too. To fix this, I modified it a little bit: `(http[s]?:\/\/(www\.)?|ftp:\/\/(www\.)?|www\.){1}([0-9A-Za-z-\.@:%_\+~#=]+)+((\.[a-zA-Z]{2,3})+)(/[^\s]*)?(\?[^\s]*)?` — Shawn, Dec 30 '22 at 03:34

score 4 · Answer 5 · answered Apr 05 '13 at 09:58

I was trying to put together some JavaScript to validate a domain name (ex. google.com) and if it validates enable a submit button. I thought that I would share my code for those who are looking to accomplish something similar. It expects a domain without any http:// or www. value. The script uses a stripped down regular expression from above for domain matching, which isn't strict about fake TLD.

http://jsfiddle.net/nMVDS/1/

$(function () {
  $('#whitelist_add').keyup(function () {
    if ($(this).val() == '') { //Check to see if there is any text entered
        //If there is no text within the input, disable the button
        $('.whitelistCheck').attr('disabled', 'disabled');
    } else {
        // Domain name regular expression
        var regex = new RegExp("^([0-9A-Za-z-\\.@:%_\+~#=]+)+((\\.[a-zA-Z]{2,3})+)(/(.)*)?(\\?(.)*)?");
        if (regex.test($(this).val())) {
            // Domain looks OK
            //alert("Successful match");
            $('.whitelistCheck').removeAttr('disabled');
        } else {
            // Domain is NOT OK
            //alert("No match");
            $('.whitelistCheck').attr('disabled', 'disabled');
        }
    }
  });
});

HTML FORM:

<form action="domain_management.php" method="get">
    <input type="text" name="whitelist_add" id="whitelist_add" placeholder="domain.com">
    <button type="submit" class="btn btn-success whitelistCheck" disabled='disabled'>Add to Whitelist</button>
</form>

What is a good regular expression to match a URL?

5 Answers5

Linked

Related