109

How can I check if a string is a valid URL?

For example:

http://hello.it => yes
http:||bra.ziz, => no

If this is a valid URL how can I check if this is relative to a image file?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Luca Romagnoli
  • 12,145
  • 30
  • 95
  • 157
  • the url you provided seems to be an absolute url, what do you mean with relative to an image file – johannes Nov 27 '09 at 11:38
  • I posted an [UriValidator with specs](http://stackoverflow.com/a/19423623/356895). – JJD Oct 17 '13 at 12:09

9 Answers9

194

Notice:

As pointed by @CGuess, there's a bug with this issue and it's been documented for over 9 years now that validation is not the purpose of this regular expression (see https://bugs.ruby-lang.org/issues/6520).




Use the URI module distributed with Ruby:

require 'uri'

if url =~ URI::regexp
    # Correct URL
end

Like Alexander Günther said in the comments, it checks if a string contains a URL.

To check if the string is a URL, use:

url =~ /\A#{URI::regexp}\z/

If you only want to check for web URLs (http or https), use this:

url =~ /\A#{URI::regexp(['http', 'https'])}\z/
Sebastián Palma
  • 32,692
  • 6
  • 40
  • 59
Mikael S
  • 5,206
  • 2
  • 23
  • 21
  • 30
    That doesn't seem to work: `'http://:5984/asdf' =~ URI::regexp` and `'http::5984/asdf' =~ URI::regexp` both return 0. I expected them to return nil because none of them are valid URIs. – awendt Nov 08 '11 at 08:47
  • 4
    Isn’t :5984 port 5984 on localhost? – mxcl Aug 07 '12 at 02:28
  • 3
    It actually checks if a variable contains a valid url. It will accept "http://example com" as a valid URL. Because it contains one. But it is not helpful if you expect the whole thing to be the URL. – Alexander Günther Dec 12 '12 at 09:04
  • This doesn't work for this `/images/myimage01.png` but the last is valid value for ccs background url or image tag source url. – gotqn Mar 22 '14 at 11:22
  • 2
    gotqn: That is not a valid URL according to RFC 1738 though. – Mikael S Mar 25 '14 at 08:53
  • 2
    With regards to Ruby's handling of `^` and `$` on multi-line strings, shouldn't this be `/\A#{URI::regexp}\z/` instead? – Seth Apr 29 '14 at 20:31
  • 14
    Do not use this, it's so bad that `"http:"` passes this regexp. – smathy Apr 13 '16 at 22:55
  • 2
    DO NOT USE: There's a bug with this issue and it's been documented for over 9 years now that validation is not the purpose of this regular expression: https://bugs.ruby-lang.org/issues/6520 – CGuess Sep 07 '21 at 22:32
48

Similar to the answers above, I find using this regex to be slightly more accurate:

URI::DEFAULT_PARSER.regexp[:ABS_URI]

That will invalidate URLs with spaces, as opposed to URI.regexp which allows spaces for some reason.

I have recently found a shortcut that is provided for the different URI rgexps. You can access any of URI::DEFAULT_PARSER.regexp.keys directly from URI::#{key}.

For example, the :ABS_URI regexp can be accessed from URI::ABS_URI.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
jonuts
  • 556
  • 4
  • 6
  • 3
    If you plan on using URI.parse at any point, this is definitely the way to go. URI::regexp matches certain URLs that will fail when later using URI.parse. Thanks for the tip. – markquezada Jun 19 '11 at 09:39
  • Sadly, this is only available on Ruby 1.9, not 1.8. – Steve Madsen Sep 20 '11 at 15:12
  • 1
    But, this works: `/^#{URI.regexp}$/`. The trouble is that `URI.regexp` doesn't anchor. A string with a space isn't validating the space as part of the URI, but everything leading up to the space. If that fragment looks like a valid URI, the match succeeds. – Steve Madsen Sep 20 '11 at 15:32
  • 3
    Applying awendt's comment to your proposals: `'http://:5984/asdf' =~ URI::DEFAULT_PARSER.regexp[:ABS_URI]` gives 0, not nil; `'http::5984/asdf'=~ URI::DEFAULT_PARSER.regexp[:ABS_URI]` gives 0; `'http://:5984/asdf' =~ /^#{URI.regexp}$/` gives 0; `'http::5984/asdf' =~ /^#{URI.regexp}$/` gives 0 as well. None of above regexps is fully correct, however they fail in very very odd situations only and this is not a big deal in most cases. – skalee Jan 09 '12 at 16:34
  • 1
    FYI, `URI::DEFAULT_PARSER.regexp[:ABS_URI]` is identical to `/\A\s*#{URI::regexp}\s*\z/` – aidan Nov 19 '18 at 05:45
  • If you're using `URI::DEFAULT_PARSER.regexp[:ABS_URI]` to validate that a URI has a scheme, be aware that matching it to `test:80` returns a truthy value. – Synthead Sep 09 '19 at 19:15
43

The problem with the current answers is that a URI is not an URL.

A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location").

Since URLs are a subset of URIs, it is clear that matching specifically for URIs will successfully match undesired values. For example, URNs:

 "urn:isbn:0451450523" =~ URI::regexp
 => 0 

That being said, as far as I know, Ruby doesn't have a default way to parse URLs , so you'll most likely need a gem to do so. If you need to match URLs specifically in HTTP or HTTPS format, you could do something like this:

uri = URI.parse(my_possible_url)
if uri.kind_of?(URI::HTTP) or uri.kind_of?(URI::HTTPS)
  # do your stuff
end
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
fotanus
  • 19,618
  • 13
  • 77
  • 111
  • 2
    `uri.kind_of?(URI::HTTP)` seems to be sufficient for both cases (http and https), at least in ruby 1.9.3. – Andrea Salicetti Apr 30 '14 at 13:08
  • still suffers the issues described by @skalee under the jonuts's answer – akostadinov Mar 29 '16 at 08:38
  • 2
    Summary, `URI.parse(string_to_be_checked).kind_of?(URI::HTTP)` does the job well. – ben Jul 05 '17 at 05:51
  • Additionally, a very common mistyping in our database shows people tend to put to many slashes: `http:///neopets.com`, which unfortunately is also valid. Checking for the presence of a hostname fixes this: `uri = URI(str) ; %w[http https].include?(uri.scheme) && !uri.host.nil?` – Shane Sep 10 '20 at 09:06
19

I prefer the Addressable gem. I have found that it handles URLs more intelligently.

require 'addressable/uri'

SCHEMES = %w(http https)

def valid_url?(url)
  parsed = Addressable::URI.parse(url) or return false
  SCHEMES.include?(parsed.scheme)
rescue Addressable::URI::InvalidURIError
  false
end
robbi5
  • 433
  • 3
  • 7
David J.
  • 31,569
  • 22
  • 122
  • 174
  • 3
    I just fed Addressable::URI.parse() with the weirdest strings to see what it rejects. It accepted crazy stuff. However the first string it did not accept was ":-)". Hmm. – mvw Aug 18 '15 at 17:27
  • 1
    How does this get so many upvotes? `Addressable::URI.parse` does not return nil with invalid input. – garbagecollector Mar 14 '18 at 22:08
  • @mvw you'll have to be more specific about what is so cray. [Another poster asked the owner of the repo about a number of 'strange' URLs](https://github.com/sporkmonger/addressable/issues/145), and he explained each one in detail. He signed off, saying it's less of a cost to be too permissive than being too restrictive. (BTW: I claim no expert knowledge over valid URLs!) – notapatch Jul 14 '21 at 10:08
  • irb(main):034:0> valid_url?('http://asd!$@.com') => true – Hackeron Oct 22 '21 at 16:20
14

For me, I use this regular expression:

/\A(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?\z/ix

Option:

  • i - case insensitive
  • x - ignore whitespace in regex

You can set this method to check URL validation:

def valid_url?(url)
  return false if url.include?("<script")
  url_regexp = /\A(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?\z/ix
  url =~ url_regexp ? true : false
end

To use it:

valid_url?("http://stackoverflow.com/questions/1805761/check-if-url-is-valid-ruby")

Testing with wrong URLs:

  • http://ruby3arabi - result is invalid
  • http://http://ruby3arabi.com - result is invalid
  • http:// - result is invalid
  • http://test.com\n<script src=\"nasty.js\"> (Just simply check "<script")
  • 127.0.0.1 - not support IP address

Test with correct URLs:

  • http://ruby3arabi.com - result is valid
  • http://www.ruby3arabi.com - result is valid
  • https://www.ruby3arabi.com - result is valid
  • https://www.ruby3arabi.com/article/1 - result is valid
  • https://www.ruby3arabi.com/websites/58e212ff6d275e4bf9000000?locale=en - result is valid
Martin K.
  • 1,168
  • 15
  • 16
  • 1
    The following is marked as valid: `"http://test.com\n – aidan Nov 19 '18 at 06:34
  • 1
    easily the best most applicable solution here for quick url checking. thanks – somedirection Mar 29 '19 at 13:02
  • 1
    irb(main):051:0> valid_url?('http://127.0.0.1') => false – Hackeron Oct 22 '21 at 16:22
12

This is a fairly old entry, but I thought I'd go ahead and contribute:

String.class_eval do
    def is_valid_url?
        uri = URI.parse self
        uri.kind_of? URI::HTTP
    rescue URI::InvalidURIError
        false
    end
end

Now you can do something like:

if "http://www.omg.wtf".is_valid_url?
    p "huzzah!"
end
Wilhelm Murdoch
  • 1,806
  • 1
  • 22
  • 42
  • 2
    This works *much* better than the above solutions. It doesn't have the caveats listed above, and also doesn't accept uris like javascript:alert('spam'). – bchurchill Feb 10 '13 at 20:31
  • 5
    but it also matches `http:/`, which may not be what you want. – Bo Jeanes Apr 03 '13 at 22:22
4

This is a little bit old but here is how I do it. Use Ruby's URI module to parse the URL. If it can be parsed then it's a valid URL. (But that doesn't mean accessible.)

URI supports many schemes, plus you can add custom schemes yourself:

irb> uri = URI.parse "http://hello.it" rescue nil
=> #<URI::HTTP:0x10755c50 URL:http://hello.it>

irb> uri.instance_values
=> {"fragment"=>nil,
 "registry"=>nil,
 "scheme"=>"http",
 "query"=>nil,
 "port"=>80,
 "path"=>"",
 "host"=>"hello.it",
 "password"=>nil,
 "user"=>nil,
 "opaque"=>nil}

irb> uri = URI.parse "http:||bra.ziz" rescue nil
=> nil


irb> uri = URI.parse "ssh://hello.it:5888" rescue nil
=> #<URI::Generic:0x105fe938 URL:ssh://hello.it:5888>
[26] pry(main)> uri.instance_values
=> {"fragment"=>nil,
 "registry"=>nil,
 "scheme"=>"ssh",
 "query"=>nil,
 "port"=>5888,
 "path"=>"",
 "host"=>"hello.it",
 "password"=>nil,
 "user"=>nil,
 "opaque"=>nil}

See the documentation for more information about the URI module.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
nyzm
  • 2,787
  • 3
  • 24
  • 30
  • I ran across this trying to fix a segfault. Using `URI.parse` was actually the cause of this in Ruby 2.5.5 - I switched to @jonuts answer below if you don't mind some odd cases falling through. For my purposes I didn't care so that was ideal. – el n00b Feb 07 '20 at 14:48
4

In general,

/^#{URI::regexp}$/

will work well, but if you only want to match http or https, you can pass those in as options to the method:

/^#{URI::regexp(%w(http https))}$/

That tends to work a little better, if you want to reject protocols like ftp://.

-2

You could also use a regex, maybe something like http://www.geekzilla.co.uk/View2D3B0109-C1B2-4B4E-BFFD-E8088CBC85FD.htm assuming this regex is correct (I haven't fully checked it) the following will show the validity of the url.

url_regex = Regexp.new("((https?|ftp|file):((//)|(\\\\))+[\w\d:\#@%/;$()~_?\+-=\\\\.&]*)")

urls = [
    "http://hello.it",
    "http:||bra.ziz"
]

urls.each { |url|
    if url =~ url_regex then
        puts "%s is valid" % url
    else
        puts "%s not valid" % url
    end
}

The above example outputs:

http://hello.it is valid
http:||bra.ziz not valid
Jamie
  • 2,245
  • 4
  • 19
  • 24
  • 5
    What about the mailto scheme? Or telnet, gopher, nntp, rsync, ssh, or any of the other schemes? URLs are a little more complicated than just HTTP and FTP. – mu is too short Dec 09 '10 at 06:04
  • Writing regex to validate URLs is difficult. Why bother? – Rimian Aug 22 '12 at 09:27
  • @Rimian, you have to bother because all `URI` can do is in fact broken. See comments under the so many upvoted answers above. Not sure if Janie's answer is right but upvoting so hopefully people consider it more seriously. TBH I end up doing `url.start_with?("http://") || url.start_with?("https://")` because I need only HTTP and users should be responsible to use proper URLs. – akostadinov Mar 29 '16 at 08:47