1

i am writing a regular expression to check a website URL it should check the following scenarios:
pass:

- www.example.com
- example.com
- www.example.com/something
- example.com/something
and prevent every other urls

its working perfectly for every thing except one case (www.example), how can i handel this case
"www.example" must not pass

my regular expression :

^[a-zA-Z0-9][a-zA-Z0-9]+([.][a-zA-Z0-9]+)+(/.*)?$

can any one help please ?

Thanx.

Kemal Fadillah
  • 9,760
  • 3
  • 45
  • 63
mohammad
  • 2,142
  • 7
  • 35
  • 60
  • www.example is this a valid URL? – shazin Mar 27 '13 at 08:49
  • @ shazin it must not pass no its not a valid url – mohammad Mar 27 '13 at 08:51
  • You can't validate that actually. except you provide a white list of allowed domain names to match. because you can't let regexp to tell whether .example is a top domain name – dotslashlu Mar 27 '13 at 08:54
  • 2
    First you need to acknowledge all the [valid TLDs](http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains), maybe store them in an array, and update your code when they change. Then you have to allow anything prior to the TLD portion because how do you know if someone called their server www or wwx or what. Anything that comes after a slash is pretty much valid. This is a crazy thing to do in javascript and much easier on the server with nslookup and so on. – James Mar 27 '13 at 08:54

5 Answers5

1

Try this one:

^(www\.)?(?!www)[a-zA-Z0-9]+\.[a-zA-Z]{2,6}/?[a-zA-Z0-9]+$
oleq
  • 15,697
  • 1
  • 38
  • 65
1

Heres the best i could get

(www.){1}[a-zA-Z0-9]+[.]{1}[\w]+[/\w]*

Result

www.example.com     - true
www.example.com/    - true
www.example.com/xyx     - true
www.example.com/xy/s/   - true
www.example.        - False
www.example         - False

please note that this wont accept 'example.com' Tested @ http://gskinner.com/RegExr/

Sudhakar
  • 4,823
  • 2
  • 35
  • 42
0

This is the actual URL validating regex used in Django 1.5.1:

import re
regex = re.compile(
        r'^(?:http|ftp)s?://'  # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'  # domain...
        r'localhost|'  # localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|'  # ...or ipv4
        r'\[?[A-F0-9]*:[A-F0-9:]+\]?)'  # ...or ipv6
        r'(?::\d+)?'  # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)

This does both ipv4 and ipv6 addresses as well as GET parameters.

Found in the code here, Line 44.

Ewan
  • 14,592
  • 6
  • 48
  • 62
0

Try this:

_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS

I can't claim credit though; I yanked it from here:

http://mathiasbynens.be/demo/url-regex

They've got a reasonable chart with lots of expressions with pass/fail for each case against each expression.

-1

Not the best regex but works in many cases:

^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}(/.*)*$

Edit:

^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+(com|org|info|biz|us)/?([^/]*)$

To allow trailing slash:

^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+(com|org|info|biz|us)/?([^/]*)/?$
Ahmed KRAIEM
  • 10,267
  • 4
  • 30
  • 33