How to check if a URL is valid

Question

How can I check if a string is a valid URL?

For example:

http://hello.it => yes
http:||bra.ziz, => no

If this is a valid URL how can I check if this is relative to a image file?

the url you provided seems to be an absolute url, what do you mean with relative to an image file — johannes, Nov 27 '09 at 11:38
I posted an [UriValidator with specs](http://stackoverflow.com/a/19423623/356895). — JJD, Oct 17 '13 at 12:09

score 194 · Accepted Answer · edited Sep 15 '21 at 16:10

194

Notice:

As pointed by @CGuess, there's a bug with this issue and it's been documented for over 9 years now that validation is not the purpose of this regular expression (see https://bugs.ruby-lang.org/issues/6520).

Use the URI module distributed with Ruby:

require 'uri'

if url =~ URI::regexp
    # Correct URL
end

Like Alexander Günther said in the comments, it checks if a string contains a URL.

To check if the string is a URL, use:

url =~ /\A#{URI::regexp}\z/

If you only want to check for web URLs (http or https), use this:

url =~ /\A#{URI::regexp(['http', 'https'])}\z/

edited Sep 15 '21 at 16:10

Sebastián Palma

32,692
6
40
59

answered Nov 26 '09 at 21:40

Mikael S

5,206
2
23
21

30

That doesn't seem to work: `'http://:5984/asdf' =~ URI::regexp` and `'http::5984/asdf' =~ URI::regexp` both return 0. I expected them to return nil because none of them are valid URIs. – awendt Nov 08 '11 at 08:47
4

Isn’t :5984 port 5984 on localhost? – mxcl Aug 07 '12 at 02:28
3

It actually checks if a variable contains a valid url. It will accept "http://example com" as a valid URL. Because it contains one. But it is not helpful if you expect the whole thing to be the URL. – Alexander Günther Dec 12 '12 at 09:04
This doesn't work for this `/images/myimage01.png` but the last is valid value for ccs background url or image tag source url. – gotqn Mar 22 '14 at 11:22
2

gotqn: That is not a valid URL according to RFC 1738 though. – Mikael S Mar 25 '14 at 08:53
2

With regards to Ruby's handling of `^` and `$` on multi-line strings, shouldn't this be `/\A#{URI::regexp}\z/` instead? – Seth Apr 29 '14 at 20:31
14

Do not use this, it's so bad that `"http:"` passes this regexp. – smathy Apr 13 '16 at 22:55
2

DO NOT USE: There's a bug with this issue and it's been documented for over 9 years now that validation is not the purpose of this regular expression: https://bugs.ruby-lang.org/issues/6520 – CGuess Sep 07 '21 at 22:32

score 48 · Answer 2 · edited Jun 06 '17 at 23:00

48

Similar to the answers above, I find using this regex to be slightly more accurate:

URI::DEFAULT_PARSER.regexp[:ABS_URI]

That will invalidate URLs with spaces, as opposed to URI.regexp which allows spaces for some reason.

I have recently found a shortcut that is provided for the different URI rgexps. You can access any of URI::DEFAULT_PARSER.regexp.keys directly from URI::#{key}.

For example, the :ABS_URI regexp can be accessed from URI::ABS_URI.

edited Jun 06 '17 at 23:00

the Tin Man

158,662
42
215
303

answered Dec 09 '10 at 05:49

jonuts

556
4
6

3

If you plan on using URI.parse at any point, this is definitely the way to go. URI::regexp matches certain URLs that will fail when later using URI.parse. Thanks for the tip. – markquezada Jun 19 '11 at 09:39
Sadly, this is only available on Ruby 1.9, not 1.8. – Steve Madsen Sep 20 '11 at 15:12
1

But, this works: `/^#{URI.regexp}$/`. The trouble is that `URI.regexp` doesn't anchor. A string with a space isn't validating the space as part of the URI, but everything leading up to the space. If that fragment looks like a valid URI, the match succeeds. – Steve Madsen Sep 20 '11 at 15:32
3

Applying awendt's comment to your proposals: `'http://:5984/asdf' =~ URI::DEFAULT_PARSER.regexp[:ABS_URI]` gives 0, not nil; `'http::5984/asdf'=~ URI::DEFAULT_PARSER.regexp[:ABS_URI]` gives 0; `'http://:5984/asdf' =~ /^#{URI.regexp}$/` gives 0; `'http::5984/asdf' =~ /^#{URI.regexp}$/` gives 0 as well. None of above regexps is fully correct, however they fail in very very odd situations only and this is not a big deal in most cases. – skalee Jan 09 '12 at 16:34
1

FYI, `URI::DEFAULT_PARSER.regexp[:ABS_URI]` is identical to `/\A\s*#{URI::regexp}\s*\z/` – aidan Nov 19 '18 at 05:45
If you're using `URI::DEFAULT_PARSER.regexp[:ABS_URI]` to validate that a URI has a scheme, be aware that matching it to `test:80` returns a truthy value. – Synthead Sep 09 '19 at 19:15

score 43 · Answer 3 · edited Jun 06 '17 at 23:01

43

The problem with the current answers is that a URI is not an URL.

A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URIs that, in addition to identifying a resource, provide a means of locating the resource by describing its primary access mechanism (e.g., its network "location").

Since URLs are a subset of URIs, it is clear that matching specifically for URIs will successfully match undesired values. For example, URNs:

 "urn:isbn:0451450523" =~ URI::regexp
 => 0

That being said, as far as I know, Ruby doesn't have a default way to parse URLs , so you'll most likely need a gem to do so. If you need to match URLs specifically in HTTP or HTTPS format, you could do something like this:

uri = URI.parse(my_possible_url)
if uri.kind_of?(URI::HTTP) or uri.kind_of?(URI::HTTPS)
  # do your stuff
end

edited Jun 06 '17 at 23:01

the Tin Man

158,662
42
215
303

answered May 03 '13 at 13:23

fotanus

19,618
13
77
111

2

`uri.kind_of?(URI::HTTP)` seems to be sufficient for both cases (http and https), at least in ruby 1.9.3. – Andrea Salicetti Apr 30 '14 at 13:08
still suffers the issues described by @skalee under the jonuts's answer – akostadinov Mar 29 '16 at 08:38
2

Summary, `URI.parse(string_to_be_checked).kind_of?(URI::HTTP)` does the job well. – ben Jul 05 '17 at 05:51
Additionally, a very common mistyping in our database shows people tend to put to many slashes: `http:///neopets.com`, which unfortunately is also valid. Checking for the presence of a hostname fixes this: `uri = URI(str) ; %w[http https].include?(uri.scheme) && !uri.host.nil?` – Shane Sep 10 '20 at 09:06

score 19 · Answer 4 · edited Nov 03 '12 at 17:28

19

I prefer the Addressable gem. I have found that it handles URLs more intelligently.

require 'addressable/uri'

SCHEMES = %w(http https)

def valid_url?(url)
  parsed = Addressable::URI.parse(url) or return false
  SCHEMES.include?(parsed.scheme)
rescue Addressable::URI::InvalidURIError
  false
end

edited Nov 03 '12 at 17:28

robbi5

433
3
7

answered Aug 14 '12 at 18:49

David J.

31,569
22
122
174

3

I just fed Addressable::URI.parse() with the weirdest strings to see what it rejects. It accepted crazy stuff. However the first string it did not accept was ":-)". Hmm. – mvw Aug 18 '15 at 17:27
1

How does this get so many upvotes? `Addressable::URI.parse` does not return nil with invalid input. – garbagecollector Mar 14 '18 at 22:08
@mvw you'll have to be more specific about what is so cray. [Another poster asked the owner of the repo about a number of 'strange' URLs](https://github.com/sporkmonger/addressable/issues/145), and he explained each one in detail. He signed off, saying it's less of a cost to be too permissive than being too restrictive. (BTW: I claim no expert knowledge over valid URLs!) – notapatch Jul 14 '21 at 10:08
irb(main):034:0> valid_url?('http://asd!$@.com') => true – Hackeron Oct 22 '21 at 16:20

Martin K. · Answer 5 · 2022-08-22T09:17:18.213

For me, I use this regular expression:

/\A(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?\z/ix

Option:

i - case insensitive
x - ignore whitespace in regex

You can set this method to check URL validation:

def valid_url?(url)
  return false if url.include?("<script")
  url_regexp = /\A(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?\z/ix
  url =~ url_regexp ? true : false
end

To use it:

valid_url?("http://stackoverflow.com/questions/1805761/check-if-url-is-valid-ruby")

Testing with wrong URLs:

http://ruby3arabi - result is invalid
http://http://ruby3arabi.com - result is invalid
http:// - result is invalid
http://test.com\n<script src=\"nasty.js\"> (Just simply check "<script")
127.0.0.1 - not support IP address

Test with correct URLs:

http://ruby3arabi.com - result is valid
http://www.ruby3arabi.com - result is valid
https://www.ruby3arabi.com - result is valid
https://www.ruby3arabi.com/article/1 - result is valid
https://www.ruby3arabi.com/websites/58e212ff6d275e4bf9000000?locale=en - result is valid

easily the best most applicable solution here for quick url checking. thanks — somedirection, Mar 29 '19 at 13:02

score 12 · Answer 6 · answered Oct 24 '12 at 00:57

12

This is a fairly old entry, but I thought I'd go ahead and contribute:

String.class_eval do
    def is_valid_url?
        uri = URI.parse self
        uri.kind_of? URI::HTTP
    rescue URI::InvalidURIError
        false
    end
end

Now you can do something like:

if "http://www.omg.wtf".is_valid_url?
    p "huzzah!"
end

answered Oct 24 '12 at 00:57

Wilhelm Murdoch

1,806
1
22
42

2

This works *much* better than the above solutions. It doesn't have the caveats listed above, and also doesn't accept uris like javascript:alert('spam'). – bchurchill Feb 10 '13 at 20:31
5

but it also matches `http:/`, which may not be what you want. – Bo Jeanes Apr 03 '13 at 22:22

score 4 · Answer 7 · edited Jun 06 '17 at 23:04

This is a little bit old but here is how I do it. Use Ruby's URI module to parse the URL. If it can be parsed then it's a valid URL. (But that doesn't mean accessible.)

URI supports many schemes, plus you can add custom schemes yourself:

irb> uri = URI.parse "http://hello.it" rescue nil
=> #<URI::HTTP:0x10755c50 URL:http://hello.it>

irb> uri.instance_values
=> {"fragment"=>nil,
 "registry"=>nil,
 "scheme"=>"http",
 "query"=>nil,
 "port"=>80,
 "path"=>"",
 "host"=>"hello.it",
 "password"=>nil,
 "user"=>nil,
 "opaque"=>nil}

irb> uri = URI.parse "http:||bra.ziz" rescue nil
=> nil


irb> uri = URI.parse "ssh://hello.it:5888" rescue nil
=> #<URI::Generic:0x105fe938 URL:ssh://hello.it:5888>
[26] pry(main)> uri.instance_values
=> {"fragment"=>nil,
 "registry"=>nil,
 "scheme"=>"ssh",
 "query"=>nil,
 "port"=>5888,
 "path"=>"",
 "host"=>"hello.it",
 "password"=>nil,
 "user"=>nil,
 "opaque"=>nil}

See the documentation for more information about the URI module.

I ran across this trying to fix a segfault. Using `URI.parse` was actually the cause of this in Ruby 2.5.5 - I switched to @jonuts answer below if you don't mind some odd cases falling through. For my purposes I didn't care so that was ideal. — el n00b, Feb 07 '20 at 14:48

score 4 · Answer 8 · answered Dec 06 '13 at 15:23

In general,

/^#{URI::regexp}$/

will work well, but if you only want to match http or https, you can pass those in as options to the method:

/^#{URI::regexp(%w(http https))}$/

That tends to work a little better, if you want to reject protocols like ftp://.

score -2 · Answer 9 · answered Nov 29 '09 at 18:30

-2

You could also use a regex, maybe something like http://www.geekzilla.co.uk/View2D3B0109-C1B2-4B4E-BFFD-E8088CBC85FD.htm assuming this regex is correct (I haven't fully checked it) the following will show the validity of the url.

url_regex = Regexp.new("((https?|ftp|file):((//)|(\\\\))+[\w\d:\#@%/;$()~_?\+-=\\\\.&]*)")

urls = [
    "http://hello.it",
    "http:||bra.ziz"
]

urls.each { |url|
    if url =~ url_regex then
        puts "%s is valid" % url
    else
        puts "%s not valid" % url
    end
}

The above example outputs:

http://hello.it is valid
http:||bra.ziz not valid

answered Nov 29 '09 at 18:30

Jamie

2,245
4
19
24

5

What about the mailto scheme? Or telnet, gopher, nntp, rsync, ssh, or any of the other schemes? URLs are a little more complicated than just HTTP and FTP. – mu is too short Dec 09 '10 at 06:04
Writing regex to validate URLs is difficult. Why bother? – Rimian Aug 22 '12 at 09:27
@Rimian, you have to bother because all `URI` can do is in fact broken. See comments under the so many upvoted answers above. Not sure if Janie's answer is right but upvoting so hopefully people consider it more seriously. TBH I end up doing `url.start_with?("http://") || url.start_with?("https://")` because I need only HTTP and users should be responsible to use proper URLs. – akostadinov Mar 29 '16 at 08:47

How to check if a URL is valid

9 Answers9

Notice:

Linked

Related