0

I have one problem with the expression below. I am trying to do URL validation using regular expression below:

    ^http(s?):\/\/(\w+\.)?[\w%\-\.$,@?^=%&:\/~\+#]+\.[\w\.$,@?^=%&:\/~\+#]+|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}+\/$

The expression above allows IP address as well as http/https:. It accepts spaces in between url. (http://example.com). How do I restrict spaces in the expression above?

sawa
  • 165,429
  • 45
  • 277
  • 381
prasad_g
  • 139
  • 8

2 Answers2

4

Don't. This isn't a suitable use of Regular Expressions, and you will never get it right for all possible URLs. Use the URI module to actually parse the URL, and catch the exception it will raise if you feed it an invalid URL.

require 'URI'

URI("http://google.com") # => #<URI::HTTP:0x007fb08500d3a8 URL:http://google.com>
URI("http://a b") # URI::InvalidURIError: bad URI(is not URI?): http://a b
user229044
  • 232,980
  • 40
  • 330
  • 338
0

Just add in a $ (to require a line end) into your first condition:

^http(s?):\/\/(\w+\.)?[\w%\-\.$,@?^=%&amp;:\/~\+#]+\.[\w\.$,@?^=%&amp;:\/~\+#]+$|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}+\/$
   ------Add in $ here---------------------------------------------------------^

Example

With that said, there's already built-in modules that are very good at parsing and validating URIs. I suggest using one of those.

Mike Christensen
  • 88,082
  • 50
  • 208
  • 326
  • Mike can you please suggest me, which special characters allowed in domain. for e.g (www.ex$am%ple.com). – prasad_g Jul 23 '13 at 16:52
  • I want to restrict special characters in domain part. – prasad_g Jul 23 '13 at 16:53
  • You can use letters (abc), numbers (123) and hyphens (-). Though domains can't begin or end with a hyphen. Now you see why people are suggesting using an *existing* module that already has all this knowledge built in. – Mike Christensen Jul 23 '13 at 17:00
  • @prasad_g I will say again, a final time, that you're doing this wrong, and your regex is broken. **Don't use a regex to do this**. The [actual regex to validate a URL is over 5000 characters long](http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url). You have *no hope* of doing this correctly, so don't do it. Use `URI` and get on with your life. – user229044 Jul 23 '13 at 17:59
  • I agree 100% with @meagar. – Mike Christensen Jul 23 '13 at 18:24
  • Hey meagar yesterday I don't have time to fix. But I tried today and works. Thanks for suggestion and help. Also thanks to Mike. – prasad_g Jul 24 '13 at 10:21