1

I've been learning some regex as I am trying to create a field that validates a proper website, not allowing for whitespace.

I currently have:

^((http|https|ftp)\://)?(www\.[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*[^\.\,\)\(\s])$

"www.google.com" PASS

"www.goo gle.com" FAIL

" www.google.com" FAIL

However

" " PASS

I thought by adding 'only begin with http|https|ftp' this would ensure the whitespace would not happen, and have even prefixed with [^\s] - but to no avail.

If it helps I am using the ASP.NET WebForms RegularExpressionValidator control.

aspirant_sensei
  • 1,568
  • 1
  • 16
  • 36
  • your regex is correct, it won't allow a white space character. See http://regex101.com/r/dZ1vT6/8 – Avinash Raj Oct 01 '14 at 15:11
  • but then do you know why the ASP control allows a white space? I had thought it may be the case that the Regex checker doesn't kickoff if it just notices blank white-space but it's a requirement I need to include EDIT: no worries – aspirant_sensei Oct 01 '14 at 15:15

2 Answers2

0

EDIT: I have found out the MS RegularExpressionValidator control sees pure white space as an empty string, and the validation doesn't kick off until a non-white-space character is inserted. A bit annoying, but supposedly just adding a RequiredFieldValidator will do the job (for anyone interested) RegularExpressionValidator not firing on white-space entry

EDIT2: For others in my situation, RequiredFieldValidator is useless as I do not want the field to be mandatory, I just want it to not contain white space! Have no idea why Microsoft wouldn't validate white space on RegEx, instead I've had to go for the simple .Trim approach server-side.

aspirant_sensei
  • 1,568
  • 1
  • 16
  • 36
0

Your regular expression will fail also for website named after the domain name, like for example http://stackoverflow.com

Also, I see it a bit overcomplicated. As the valid host/domain name can contain only digits, letters and hyphens (the latter only if surrounded by digits or letters...), you could change your regex to:

^(?:[a-zA-Z0-9]+[-]*[a-zA-Z0-9]+)(?:[.][a-zA-Z0-9]+[-]*[a-zA-Z0-9]+)+$

I would not check the protocol, but if you need it here's the complete regex:

^(?:(?:http|https|ftp)\://)?(?:[a-zA-Z0-9]+[-]*[a-zA-Z0-9]+)(?:[.][a-zA-Z0-9]+[-]*[a-zA-Z0-9]+)+$

http://regex101.com/r/pP0iG6/1

Instead, because the path can contain almost everything(*), in my opinion the only way to validate it is to issue a HEAD and see the server's response code.

spider
  • 1,164
  • 9
  • 16
  • (*) that's not actually true, there is a valid character set: http://tools.ietf.org/html/rfc3986#section-2 However you can have a look at this answer for a better regexp for the URI part http://stackoverflow.com/a/1547940/384630 – spider Oct 01 '14 at 15:47