0

I want to validate a URL using regular expression. Following are my conditions to validate the URL:

  1. Scheme is optional
  2. Subdomains should be allowed
  3. Port number should be allowed
  4. Path should be allowed.

I was trying the following pattern:

((http|https)://)?([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?

But I am not getting the desired results. Even an invalid URL like '*.example.com' is getting matched.

What is wrong with it?

Games Brainiac
  • 80,178
  • 33
  • 141
  • 199
  • possible duplicate of [What is the best regular expression to check if a string is a valid URL?](http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url) – Shiplu Mokaddim Sep 21 '13 at 14:27
  • tray to add a word boundary in the beginning and the end, and you make the whole thing optional with `?` – Vitim.us Sep 21 '13 at 16:58
  • I tried the answers associated with a duplicate question, but did not get the desired output. – Suyog Joshi Sep 23 '13 at 05:13
  • Hi Everyone, a little bit of tweaking and from all your help and comments i found the regular expression for my specific condition: ^(http(s)?://)?[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-‌​\.\?\,\'\/\\\+&%\$#_]*)?$ – Suyog Joshi Sep 23 '13 at 05:29

4 Answers4

1

are you matching the entire string? you don't say what language you are using, but in python it looks like you may be using search instead of match.

one way to fix this is to start you regexp with ^ and end it with $.

andrew cooke
  • 45,717
  • 10
  • 93
  • 143
0

While parsing URL's is best left to a library (since I know perl best, I would suggest something like http://search.cpan.org/dist/URI/), if you want some help debugging that statement, it might be best to try it in a debugger, something like: http://www.debuggex.com/.

I think one of the main reasons it is matching, is because you don't use beginning and ending string match markers. Meaning, no part of that string might be matching what you put in explicitly, but because you haven't marked it with beginning and end markers for the string, your regex could just be matching 'example.com' in your string, not the entire input.

Horus
  • 1,169
  • 9
  • 13
0

Found the regular expression for my condition with help from your inputs

^(http(s)?://)?[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-‌​\.\?\,\'\/\\\+&%\$#_]*)?$
0

Following code works for me in c#

private static bool IsValidUrl(string url)
{
     return new Regex(@"^(http|http(s)?://)?([\w-]+\.)+[\w-]+[.\w]+(\[\?%&=]*)?").IsMatch(url) &&!new Regex(@"[^a-zA-Z0-9]+$").IsMatch(url);
}

it allows "something.anything (at least 2 later after period) with or without http(s) and www.

Uttam Ughareja
  • 842
  • 2
  • 12
  • 21