2

I need a url validator regex with this criteria:

  • protocol (HTTP, HTTPS) is optional. But if any protocol is given, it must be in the correct format, i.e. protocol:domain, or protocol://domain.
  • www is optional
  • it's possible to use direct IP address for this.

So based on the criteria, these should pass:

These should not pass:

  • hello
  • hello/world
  • abc://def.ghi
  • ftp:google.com

The closest regex I've found is from here:

^((?:.|\n)*?)((http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)([-A-Z0-9.]+)(/[-A-Z0-9+&@#/%=~_|!:,.;]*)?(\?[A-Z0-9+&@#/%=~_|!:‌​,.;]*)?)

But unfortunately, google.com doesn't pass. It needs to have www. as a prefix. Can you improve this regex so www. becomes optional?

Chen Li Yong
  • 5,459
  • 8
  • 58
  • 124
  • Maybe something like `^(?:https?:\/\/(?:www\.)?|https:(?:\/\/)?)?\w+(?:[-.]\w+)+(?:\/[^\/\s]+)*$` will be enough? See [this demo](https://regex101.com/r/ibx3ED/2). – Wiktor Stribiżew Feb 08 '22 at 14:21
  • "*But unfortunately, google.com doesn't pass. It needs to have www. as a prefix.*" [it seems to work for "google.com"](https://regex101.com/r/lUkKoJ/1). I mean it's horrible but it does match it. I chose the Python flavour only because for PHP/JS the pattern is wrong (unescaped forward slashes) and I didn't want to change them. – VLAZ Feb 08 '22 at 14:21
  • 2
    Do you really need this to be a regex-based solution, or are you asking an XY problem? Typically I would opt to solve this with a URL parsing library in the language of your choice, not a regex. – Tom Lord Feb 08 '22 at 15:03
  • @WiktorStribiżew this works wonderfully! I'll use this. If you repost this as an answer, I'll accept it. – Chen Li Yong Feb 08 '22 at 15:22
  • @TomLord but have you considered that using a library would be easier to implement and easier to maintain in the future, as well? It's almost as if you are trying to keep things simple. – VLAZ Feb 08 '22 at 15:40

1 Answers1

0

It looks like the following pattern matches your criteria:

^(?:https?:\/\/(?:www\.)?|https:(?:\/\/)?)?\w+(?:[-.]\w+)+(?:\/[^\/\s]+)*$

See the regex demo. Details:

  • ^ - start of the string
  • (?:https?:\/\/(?:www\.)?|https:(?:\/\/)?)? - an optional sequence of:
    • https?:\/\/(?:www\.)? - http or https, :// and then an optional www. substring
    • | - or
    • https:(?:\/\/)? - https: and then an optional // string
  • \w+ - one or more word chars
  • (?:[-.]\w+)+- one or more sequences of a . or - followed with one or more word chars
  • (?:\/[^\/\s]+)* - an optional sequence of a / an then one or more chars other than / and whitespace
  • $ - end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563