2

I'm working on create a regular expression in javascript to validate website urls. I searched a bit in the stackoverflow community and i did not find something to be completed helpful.

My regex until now: /(https?:\/\/)?(www\.)?[a-zA-Z0-9]+\.[a-zA-Z]{2,}/g

But it seems to fail and pass the validation for the url with two w like ww.test.com

Should pass the test of regex:

http://www.test.com
https://www.test.com
www.test.com
www.test.co.uk
www.t.com
test.com
test.fr
test.co.uk

Should not pass the test of regex:

w.test.com
ww.test.com
www.test
test
ww.test.
.test
.test.com
.test.co.ul
.test.

Any suggestions or thoughts?

Panagiotis
  • 201
  • 3
  • 16
  • 1
    So `Mr.Bean` is also a valid URL? – anubhava Aug 30 '16 at 06:06
  • 3
    `"ww.test.com"` is a valid website address - why wouldn't it be? So is `www.test-site.com`, but your regex doesn't allow for hyphens. – nnnnnn Aug 30 '16 at 06:06
  • Well-formed URL and valid URL are two different things – Aaron Aug 30 '16 at 06:07
  • No Mr.Bean should not be a valid url. Something i'm missing. But also the 'w.test.com' should not be valid. – Panagiotis Aug 30 '16 at 06:08
  • 1
    @Panagiotis Why not? What if some sites were accessed in this fashion `someurl.user.com` You're confusing URL with a domain name – Aaron Aug 30 '16 at 06:10
  • Example: http://m.xkcd.com/ - if the "m" is valid, why not a "w"? (Also, if you want to "validate" URLs your regex should probably begin and end with `^` and `$`.) – nnnnnn Aug 30 '16 at 06:11
  • Are you looking for test that tries to resolve the URL? – chris85 Aug 30 '16 at 06:12
  • @Aaron well, yes you are right, I forgot this scenario. So to make it simple I will have the part until www as optional but then I have to validate if there is a .com or a .co.uk. But there are a lot. Is there any way to do it properly? – Panagiotis Aug 30 '16 at 06:14
  • @nnnnnn Yes as I mentioned above I forgot that scenario. Thanks for that. – Panagiotis Aug 30 '16 at 06:15
  • @chris85 I'm trying to find a regex for doing a validation check in an HTML form for the website field. – Panagiotis Aug 30 '16 at 06:15
  • Is it possible that [a-zA-Z0-9] includes a dot in Javascript by default? I don't see another way how ww.test.con can be accepted. But it gets more difficult when co.uk has to be accepted. Please note that ww.test.com is an unusual, but technically absolutely valid domain. So, changing the requirements might be worth considering. – mm759 Aug 30 '16 at 06:16
  • 2
    @Panagiotis What you're trying to do is impossible with regex. A URL could be valid but could not exist, so you need to resolve the URL on a valid URL. – Aaron Aug 30 '16 at 06:16
  • @mm759 No that should be [a-zA-Z0-9\.\-_]+ to accept also any other characters in the url or better [^\s\.]+ . Yes as you mentioned it will be good to review the requirements. – Panagiotis Aug 30 '16 at 06:19
  • Two `w`s are not an invalid subdomain so writing a regex to globally disallow them doesn't make sense. E.g. `w.google.com`, `ww.google.com`. `www.google.com` all could be valid requests if google set them up to resolve. – chris85 Aug 30 '16 at 06:19
  • @Panagiotis You can skip the regex check overall, and go straight to resolving the URL, if the domain name exists, then you already know that it's a valid URL. If it doesn't then it doesn't matter. You can send out a same `unable to resolve url` for both cases. – Aaron Aug 30 '16 at 06:20
  • @Aaron thanks for that. But to do this i have to do lets say a jquery ajax call to the given url and i dont want to do that in me application. I cannot think a different way now how to resolve a url. – Panagiotis Aug 30 '16 at 06:24
  • @chris85 Thanks for your tips. – Panagiotis Aug 30 '16 at 06:25
  • @Panagiotis That's a separate question, someone will need to answer `Javascript: Website url validation with regex` – Aaron Aug 30 '16 at 06:25
  • `[a-zA-Z0-9]` is simpler as `\w`. – RobG Aug 30 '16 at 06:27
  • @Aaron True I'm waiting if someone write an answer for the question. Thanks anyway for your suggestions. – Panagiotis Aug 30 '16 at 06:29
  • @Panagiotis—you can do AJAX without jQuery, see [*you might not need jQuery: request*](http://youmightnotneedjquery.com/#request). – RobG Aug 30 '16 at 06:30
  • 2
    @Panagiotis Not sure if this is an option (Don't see why not), but you can set `` Have a look a this question if it's a possibility http://stackoverflow.com/questions/13820477/html5-input-tag-validation-for-url – Aaron Aug 30 '16 at 06:30
  • @RobG But if I do AJAX its like we open a request the we do something and then we send to get a response. It like a different way to do it. Or I'm wrong? – Panagiotis Aug 30 '16 at 06:38
  • @Aaron Yes maybe HTML5 will be my savior :) I will test it. – Panagiotis Aug 30 '16 at 06:38
  • @Panagiotis—just saying you don't need jQuery for this. See the answer to [*using javascript to detect whether the url exists…*](http://stackoverflow.com/questions/10926880/using-javascript-to-detect-whether-the-url-exists-before-display-in-iframe/10926978#10926978). – RobG Aug 30 '16 at 23:39

3 Answers3

4

Even if this answer is a bit too much for this Problem, it illustrates the problem: Even if it might be possible to create a regexp to check the url, it is much simpler and more robust to parse the URL and "create a real Object", on/with which the overall test can be decomposed to a number of smaller tests.

So probably the builtin URL constructor of modern browsers may help you here (LINK 1, LINK 2).

One approach to test you url might look like this:

function testURL (urlstring) {
    var errors = [];
    try {
        var url = new URL(urlstring);

        if (!/https/.test(url.protocol)) {
           errors.push('wrong protocol');
        }

        //more tests here

    } catch(err) {
      //something went really wrong
      //log the error here

    } finally {
      return errors;
    }
}


if (testURL('mr.bean').length == 0) { runSomething(); }
Community
  • 1
  • 1
philipp
  • 15,947
  • 15
  • 61
  • 106
  • Also for less modern browsers that have [*XMLHttpRequest*](http://stackoverflow.com/questions/10926880/using-javascript-to-detect-whether-the-url-exists-before-display-in-iframe/10926978#10926978). ;-) – RobG Aug 30 '16 at 23:41
  • Thanks for that @philipp. Since testing the whole url with regex is a bit impossible, using this way I will test specific parts of the url depending on my requirements. – Panagiotis Aug 31 '16 at 05:39
1

Here's a non official, but works for most things one with an explanation. This should be good enough for most situations.

(https?:\/\/)?[\w\-~]+(\.[\w\-~]+)+(\/[\w\-~]*)*(#[\w\-]*)?(\?.*)?

  1. (https?:\/\/)? - start with http:// or https:// or not
  2. [\w\-~]+(\.[\w\-~]+)+ follow it with the domain name [\w\-~] and at least one extension (\.[\w\-~])+
    • [\w\-~] == [a-zA-Z0-9_\-~]
    • Multiple extensions would mean test.go.place.com
  3. (\/[\w\-~]*)* then as many sub directories as wished
    • In order to easily make test.com/ pass, the slash does not enforce following characters. This can be abused like so: test.com/la////la.
  4. (#[\w\-]*)? Followed maybe by an element id
  5. (\?.*)? Followed maybe by url params, which (for the sake of simplicity) can be pretty much whatever

There are plenty of edge cases where this will break, or where it should but it doesn't. But, for most cases where people aren't doing anything wacky, this should work.

Seph Reed
  • 8,797
  • 11
  • 60
  • 125
  • I use `const url_pattern = new RegExp("(https?:\/\/)?[\w\-~]+(\.[\w\-~]+)+(\/[\w\-~]*)*(#[\w\-]*)?(\?.*)?");` and it gives me `Uncaught SyntaxError: Invalid regular expression`. Can you please tell me what I'm doing wrong? – mikasa Dec 26 '19 at 04:50
  • This regex is not formatted as a string. Try `/[abc123]/` notation. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp – Seph Reed Dec 30 '19 at 16:58
-1
/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:@\-_=#])*/g
chris85
  • 23,846
  • 7
  • 34
  • 51
Ish
  • 2,085
  • 3
  • 21
  • 38