150

I was wondering how I would best validate URLs in Rails. I was thinking of using a regular expression, but am not sure if this is the best practice.

And, if I were to use a regex, could someone suggest one to me? I am still new to Regex.

user489998
  • 4,473
  • 2
  • 29
  • 35
jay
  • 12,066
  • 16
  • 64
  • 103

23 Answers23

174

Validating an URL is a tricky job. It's also a very broad request.

What do you want to do, exactly? Do you want to validate the format of the URL, the existence, or what? There are several possibilities, depending on what you want to do.

A regular expression can validate the format of the URL. But even a complex regular expression cannot ensure you are dealing with a valid URL.

For instance, if you take a simple regular expression, it will probably reject the following host

http://invalid##host.com

but it will allow

http://invalid-host.foo

that is a valid host, but not a valid domain if you consider the existing TLDs. Indeed, the solution would work if you want to validate the hostname, not the domain because the following one is a valid hostname

http://host.foo

as well the following one

http://localhost

Now, let me give you some solutions.

If you want to validate a domain, then you need to forget about regular expressions. The best solution available at the moment is the Public Suffix List, a list maintained by Mozilla. I created a Ruby library to parse and validate domains against the Public Suffix List, and it's called PublicSuffix.

If you want to validate the format of an URI/URL, then you might want to use regular expressions. Instead of searching for one, use the built-in Ruby URI.parse method.

require 'uri'

def valid_url?(uri)
  uri = URI.parse(uri) && uri.host.present?
rescue URI::InvalidURIError
  false
end

You can even decide to make it more restrictive. For instance, if you want the URL to be an HTTP/HTTPS URL, then you can make the validation more accurate.

require 'uri'

def valid_url?(url)
  uri = URI.parse(url)
  uri.is_a?(URI::HTTP) && uri.host.present?
rescue URI::InvalidURIError
  false
end

Of course, there are tons of improvements you can apply to this method, including checking for a path or a scheme.

Last but not least, you can also package this code into a validator:

class HttpUrlValidator < ActiveModel::EachValidator

  def self.compliant?(value)
    uri = URI.parse(value)
    uri.is_a?(URI::HTTP) && uri.host.present?
  rescue URI::InvalidURIError
    false
  end

  def validate_each(record, attribute, value)
    unless value.present? && self.class.compliant?(value)
      record.errors.add(attribute, "is not a valid HTTP URL")
    end
  end

end

# in the model
validates :example_attribute, http_url: true

Note for newer URI versions(i.e 0.12.1)

.present? / .blank? would be a more accurate way to validate hosts, instead of using uri.host.nil? or just if uri.host previously (i.e. URI v 0.11).

Example for URI.parse("https:///394"):

  • new URI version(0.12), host will return an empty string, and /394 becomes a path. #<URI::HTTPS https:///394>
  • old URI version (0.11),host will return an empty string, and /394 becomes a path too. #<URI::HTTPS https:/394>
Simone Carletti
  • 173,507
  • 49
  • 363
  • 364
  • 1
    Note that the class will be `URI::HTTPS` for https uris (ex: `URI.parse("https://yo.com").class => URI::HTTPS` – tee Mar 12 '13 at 17:03
  • 15
    `URI::HTTPS` inherits from `URI:HTTP`, that's the reason why I use `kind_of?`. – Simone Carletti Mar 13 '13 at 10:37
  • 2
    By far the most complete solution to safely validate a URL. – Fabrizio Regini May 21 '14 at 16:27
  • `!!URI.parse('http://invalid-host.foo')` returns `true` for me, which seems to indicate that either the example you cited as invalid is actually valid, or that the method cited does not work as a validator for that example. I'm using Rails 3.2.13. – maurice Jan 23 '15 at 01:40
  • 4
    `URI.parse('http://invalid-host.foo')` returns true because that URI is a valid URL. Also note that `.foo` is now a valid TLD. http://www.iana.org/domains/root/db/foo.html – Simone Carletti Jan 23 '15 at 09:12
  • This solution lacks the successful validation of [IDNA](https://en.wikipedia.org/wiki/Internationalized_domain_name) URIs – ckruse Sep 13 '15 at 09:22
  • ```!!URI.parse("asdfasf")``` returns ```true``` – jmccartie Sep 21 '15 at 02:30
  • 1
    @jmccartie please read the entire post. If you care about the scheme, you should use the final code that includes also a type check, not just that line. You stopped reading before the end of the post. – Simone Carletti Sep 21 '15 at 06:49
  • Is this still best practice for Rails? It seems that using this in a model overwrites the default `valid?` method from Rails itself, breaking the model. Also it seems that URI.parse('http://') returns a perfectly valid instance of URI::HTTP, although one shouldn't accept this as a valid URL in my opinion. Thanks. – bo-oz Jan 03 '17 at 10:18
  • 1
    @bo-oz yes, it is. The `valid?` was just an example. I made some changes to make it more clear, and also added an example of a validator. – Simone Carletti Jan 03 '17 at 11:33
  • 2
    `www.google` IS a valid domain, especially now that `.GOOGLE` is a valid TLD: https://github.com/whois/ianawhois/blob/master/GOOGLE. If you want the validator to explicitly validate specific TLDs, then you have to add whatever business logic you feel appropriate. – Simone Carletti Jan 03 '17 at 13:40
  • 1
    Very good solution (the last code block). However, I believe it's standard practice in Rails to only validate if the value is present, so I would change `validate_each` to have the following def: `if value.present? record.errors.add(attribute, "is not a valid HTTP URL") unless self.class.compliant?(value) end` This means that I can still allow empty values, but enforce that there is a value using `presence: true` in the `validates` statement – elsurudo Jan 30 '17 at 11:34
  • Note: this implementation *also validates presence* replace `unless` with `if` and add a `!` before `self` if you want it to still be an optional argument – Menachem Hornbacher Jun 22 '17 at 15:43
  • Note that the standards documentation for URIs ( https://tools.ietf.org/html/std66#appendix-A ) actually allows for a lot of things you might not expect, including `http:/` and `actually@an-email.com`. Checking for a valid URI will rarely be enough for what you want so in most cases you should also be checking the individual components (scheme, host, path, etc.). – DaveMongoose Oct 05 '17 at 10:03
  • for now, ```URI.parse('https:')``` will not trigger URI::InvalidURIError but ```URI::DEFAULT_PARSER.parse('https:')``` will trigger that error – V-SHY Sep 13 '21 at 03:21
119

I use a one liner inside my models:

validates :url, format: URI::DEFAULT_PARSER.make_regexp(%w[http https])

I think is good enough and simple to use. Moreover it should be theoretically equivalent to the Simone's method, as it use the very same regexp internally.

Jarl
  • 2,831
  • 4
  • 24
  • 31
Matteo Collina
  • 1,469
  • 1
  • 8
  • 4
57

Following Simone's idea, you can easily create you own validator.

class UrlValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    return if value.blank?
    begin
      uri = URI.parse(value)
      resp = uri.kind_of?(URI::HTTP)
    rescue URI::InvalidURIError
      resp = false
    end
    unless resp == true
      record.errors[attribute] << (options[:message] || "is not an url")
    end
  end
end

and then use

validates :url, :presence => true, :url => true

in your model.

Tigraine
  • 23,358
  • 11
  • 65
  • 110
jlfenaux
  • 3,263
  • 1
  • 26
  • 33
  • 1
    where should I put this class? In a initializer? – deb Aug 09 '12 at 19:06
  • 3
    I quote from @gbc: "If you place your custom validators in app/validators they will be automatically loaded without needing to alter your config/application.rb file." (http://stackoverflow.com/a/6610270/839847). Note that the answer below from Stefan Pettersson shows that he saved a similar file in "app/validators" as well. – bergie3000 Nov 28 '12 at 23:15
  • is there anyway to modify this to allow protocols to be optional? – n00b Mar 09 '13 at 19:50
  • 4
    this only checks if the url start with http:// or https:// , it's not a proper URL validation – maggix Jul 10 '13 at 09:52
  • 1
    End if you can afford the URL to be optional: class OptionalUrlValidator < UrlValidator def validate_each(record, attribute, value) return true if value.blank? return super end end – Mick F Sep 19 '13 at 12:08
  • I edited in the return statement to allow null values. If you want the URL to be mandatory include presence: true in the validation – Tigraine Sep 25 '14 at 11:43
  • 1
    This is not a good validation: `URI("http:").kind_of?(URI::HTTP) #=> true` – smathy Apr 13 '16 at 22:50
32

There is also validate_url gem (which is just a nice wrapper for Addressable::URI.parse solution).

Just add

gem 'validate_url'

to your Gemfile, and then in models you can

validates :click_through_url, url: true
Ev Dolzhenko
  • 6,100
  • 5
  • 38
  • 30
  • @ЕвгенийМасленков that might be just as well because its valid according to the spec, but you might want to check https://github.com/sporkmonger/addressable/issues . Also in general case we have found that nobody follows the standard and instead are using simple format validation. – Ev Dolzhenko Jun 19 '14 at 13:31
14

This question is already answered, but what the heck, I propose the solution I'm using.

The regexp works fine with all urls I've met. The setter method is to take care if no protocol is mentioned (let's assume http://).

And finally, we make a try to fetch the page. Maybe I should accept redirects and not only HTTP 200 OK.

# app/models/my_model.rb
validates :website, :allow_blank => true, :uri => { :format => /(^$)|(^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$)/ix }

def website= url_str
  unless url_str.blank?
    unless url_str.split(':')[0] == 'http' || url_str.split(':')[0] == 'https'
        url_str = "http://" + url_str
    end
  end  
  write_attribute :website, url_str
end

and...

# app/validators/uri_vaidator.rb
require 'net/http'

# Thanks Ilya! http://www.igvita.com/2006/09/07/validating-url-in-ruby-on-rails/
# Original credits: http://blog.inquirylabs.com/2006/04/13/simple-uri-validation/
# HTTP Codes: http://www.ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTPResponse.html

class UriValidator < ActiveModel::EachValidator
  def validate_each(object, attribute, value)
    raise(ArgumentError, "A regular expression must be supplied as the :format option of the options hash") unless options[:format].nil? or options[:format].is_a?(Regexp)
    configuration = { :message => I18n.t('errors.events.invalid_url'), :format => URI::regexp(%w(http https)) }
    configuration.update(options)

    if value =~ configuration[:format]
      begin # check header response
        case Net::HTTP.get_response(URI.parse(value))
          when Net::HTTPSuccess then true
          else object.errors.add(attribute, configuration[:message]) and false
        end
      rescue # Recover on DNS failures..
        object.errors.add(attribute, configuration[:message]) and false
      end
    else
      object.errors.add(attribute, configuration[:message]) and false
    end
  end
end
  • really neat! thanks for your input, there are often many approaches to a problem; it's great when people share theirs. – jay Nov 06 '12 at 01:42
  • 7
    Just wanted to point out that according to the [rails security guide](http://guides.rubyonrails.org/security.html#regular-expressions) you should use \A and \z rather than $^ in that regexp – Jared Jan 17 '13 at 00:54
  • 1
    I like it. Quick suggestion to dry out the code a bit by moving the regex into the validator, as I imagine you'd want it to be consistent across models. Bonus: It would allow you to drop the first line under validate_each. – Paul Pettengill Sep 14 '13 at 00:36
  • What if url is taking long and timeout? What will be the best option to show the timeout error message or if page cannot be opened? – user588324 Apr 18 '14 at 13:02
  • this would never pass a security audit, you are making your servers poke an arbitrary url – Mauricio Jan 02 '20 at 18:18
13

The solution that worked for me was:

validates_format_of :url, :with => /\A(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w\.-]*)*\/?\Z/i

I did try to use some of the example that you attached but I'm supporting url like so:

Notice the use of A and Z because if you use ^ and $ you will see this warning security from Rails validators.

 Valid ones:
 'www.crowdint.com'
 'crowdint.com'
 'http://crowdint.com'
 'http://www.crowdint.com'

 Invalid ones:
  'http://www.crowdint. com'
  'http://fake'
  'http:fake'
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Heriberto Magaña
  • 882
  • 10
  • 11
  • 1
    Try this with `"https://portal.example.com/portal/#"`. In Ruby 2.1.6 the evaluation hangs. – Old Pro Nov 10 '15 at 02:33
  • you're right seems like in some cases this regular expression takes forever to resolve :( – Heriberto Magaña Apr 14 '16 at 15:16
  • 2
    obviously, there is not a regex that covers every scenario, that's why I'm ending up using just a simple validation: validates :url, format: { with: URI.regexp }, if: Proc.new { |a| a.url.present? } – Heriberto Magaña Apr 14 '16 at 15:53
12

You can also try valid_url gem which allows URLs without the scheme, checks domain zone and ip-hostnames.

Add it to your Gemfile:

gem 'valid_url'

And then in model:

class WebSite < ActiveRecord::Base
  validates :url, :url => true
end
Roman Ralovets
  • 131
  • 2
  • 4
10

Just my 2 cents:

before_validation :format_website
validate :website_validator

private

def format_website
  self.website = "http://#{self.website}" unless self.website[/^https?/]
end

def website_validator
  errors[:website] << I18n.t("activerecord.errors.messages.invalid") unless website_valid?
end

def website_valid?
  !!website.match(/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-=\?]*)*\/?$/)
end

EDIT: changed regex to match parameter urls.

lafeber
  • 2,683
  • 1
  • 27
  • 29
  • 1
    thanks for your input, always good to see different solutions – jay May 14 '13 at 01:54
  • Btw, your regexp will reject valid urls with query string such as `http://test.com/fdsfsdf?a=b` – mikdiet Feb 26 '15 at 15:37
  • 2
    We put this code into production and kept getting timeouts on infinite loops on the .match regex line. Not sure why, just caution for some cornercases and would love to hear other's thoughts on why this would occur. – toobulkeh Aug 07 '15 at 15:22
5

I ran into the same problem lately (I needed to validate urls in a Rails app) but I had to cope with the additional requirement of unicode urls (e.g. http://кц.рф)...

I researched a couple of solutions and came across the following:

severin
  • 10,148
  • 1
  • 39
  • 40
  • Yeah, but `Addressable::URI.parse('http:///').scheme # => "http"` or `Addressable::URI.parse('Съешь [же] ещё этих мягких французских булок да выпей чаю')` are perfectly ok from Addressable's point of view :( – smileart Aug 10 '17 at 16:25
4

Here is an updated version of the validator posted by David James. It has been published by Benjamin Fleischer. Meanwhile, I pushed an updated fork which can be found here.

require 'addressable/uri'

# Source: http://gist.github.com/bf4/5320847
# Accepts options[:message] and options[:allowed_protocols]
# spec/validators/uri_validator_spec.rb
class UriValidator < ActiveModel::EachValidator

  def validate_each(record, attribute, value)
    uri = parse_uri(value)
    if !uri
      record.errors[attribute] << generic_failure_message
    elsif !allowed_protocols.include?(uri.scheme)
      record.errors[attribute] << "must begin with #{allowed_protocols_humanized}"
    end
  end

private

  def generic_failure_message
    options[:message] || "is an invalid URL"
  end

  def allowed_protocols_humanized
    allowed_protocols.to_sentence(:two_words_connector => ' or ')
  end

  def allowed_protocols
    @allowed_protocols ||= [(options[:allowed_protocols] || ['http', 'https'])].flatten
  end

  def parse_uri(value)
    uri = Addressable::URI.parse(value)
    uri.scheme && uri.host && uri
  rescue URI::InvalidURIError, Addressable::URI::InvalidURIError, TypeError
  end

end

...

require 'spec_helper'

# Source: http://gist.github.com/bf4/5320847
# spec/validators/uri_validator_spec.rb
describe UriValidator do
  subject do
    Class.new do
      include ActiveModel::Validations
      attr_accessor :url
      validates :url, uri: true
    end.new
  end

  it "should be valid for a valid http url" do
    subject.url = 'http://www.google.com'
    subject.valid?
    subject.errors.full_messages.should == []
  end

  ['http://google', 'http://.com', 'http://ftp://ftp.google.com', 'http://ssh://google.com'].each do |invalid_url|
    it "#{invalid_url.inspect} is a invalid http url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.full_messages.should == []
    end
  end

  ['http:/www.google.com','<>hi'].each do |invalid_url|
    it "#{invalid_url.inspect} is an invalid url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.should have_key(:url)
      subject.errors[:url].should include("is an invalid URL")
    end
  end

  ['www.google.com','google.com'].each do |invalid_url|
    it "#{invalid_url.inspect} is an invalid url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.should have_key(:url)
      subject.errors[:url].should include("is an invalid URL")
    end
  end

  ['ftp://ftp.google.com','ssh://google.com'].each do |invalid_url|
    it "#{invalid_url.inspect} is an invalid url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.should have_key(:url)
      subject.errors[:url].should include("must begin with http or https")
    end
  end
end

Please notice that there are still strange HTTP URIs that are parsed as valid addresses.

http://google  
http://.com  
http://ftp://ftp.google.com  
http://ssh://google.com

Here is a issue for the addressable gem which covers the examples.

Community
  • 1
  • 1
JJD
  • 50,076
  • 60
  • 203
  • 339
  • In the [above linked issue](https://github.com/sporkmonger/addressable/issues/145) the owner of the repository explained in great detail why the "strange HTTP URIs" were valid and how, for his library's work, failing a valid URI is more damaging than allowing an invalid URI. – notapatch Jul 11 '21 at 09:09
  • Now, tell me, how is 'www.google.com' and 'google.com' invalid urls? – Kaka Ruto Aug 07 '23 at 20:22
3

I use a slight variation on lafeber solution above. It disallows consecutive dots in the hostname (such as for instance in www.many...dots.com):

%r"\A(https?://)?[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]{2,6}(/.*)?\Z"i

URI.parse seems to mandate scheme prefixing, which in some cases is not what you may want (e.g. if you want to allow your users to quickly spell URLs in forms such as twitter.com/username)

Community
  • 1
  • 1
Franco
  • 669
  • 2
  • 8
  • 23
2

I have been using the 'activevalidators' gem and it's works pretty well (not just for urls validation)

you can find it here

It's all documented but basically once the gem added you'll want to add the following few lines in an initializer say : /config/environments/initializers/active_validators_activation.rb

# Activate all the validators
ActiveValidators.activate(:all)

(Note : you can replace :all by :url or :whatever if you just want to validate specific types of values)

And then back in your model something like this

class Url < ActiveRecord::Base
   validates :url, :presence => true, :url => true
end

Now Restart the server and that should be it

Arnaud Bouchot
  • 1,885
  • 1
  • 21
  • 19
2

If you want simple validation and a custom error message:

  validates :some_field_expecting_url_value,
            format: {
              with: URI.regexp(%w[http https]),
              message: 'is not a valid URL'
            }
Caleb
  • 3,692
  • 3
  • 24
  • 28
2

I liked to monkeypatch the URI module to add the valid? method

inside config/initializers/uri.rb

module URI
  def self.valid?(url)
    uri = URI.parse(url)
    uri.is_a?(URI::HTTP) && !uri.host.nil?
  rescue URI::InvalidURIError
    false
  end
end
Blair Anderson
  • 19,463
  • 8
  • 77
  • 114
1

You can validate multiple urls using something like:

validates_format_of [:field1, :field2], with: URI.regexp(['http', 'https']), allow_nil: true
Damien Roche
  • 13,189
  • 18
  • 68
  • 96
1

https://github.com/perfectline/validates_url is a nice and simple gem that will do pretty much everything for you

stuartchaney
  • 432
  • 5
  • 16
1

Recently I had this same issue and I found a work around for valid urls.

validates_format_of :url, :with => URI::regexp(%w(http https))
validate :validate_url
def validate_url

  unless self.url.blank?

    begin

      source = URI.parse(self.url)

      resp = Net::HTTP.get_response(source)

    rescue URI::InvalidURIError

      errors.add(:url,'is Invalid')

    rescue SocketError 

      errors.add(:url,'is Invalid')

    end



  end

The first part of the validate_url method is enough to validate url format. The second part will make sure the url exists by sending a request.

Dilnavaz
  • 83
  • 8
0

And as a module

module UrlValidator
  extend ActiveSupport::Concern
  included do
    validates :url, presence: true, uniqueness: true
    validate :url_format
  end

  def url_format
    begin
      errors.add(:url, "Invalid url") unless URI(self.url).is_a?(URI::HTTP)
    rescue URI::InvalidURIError
      errors.add(:url, "Invalid url")
    end
  end
end

And then just include UrlValidator in any model that you want to validate url's for. Just including for options.

MCB
  • 2,021
  • 1
  • 18
  • 32
0

URL validation cannot be handled simply by using a Regular Expression as the number of websites keep growing and new domain naming schemes keep coming up.

In my case, I simply write a custom validator that checks for a successful response.

class UrlValidator < ActiveModel::Validator
  def validate(record)
    begin
      url = URI.parse(record.path)
      response = Net::HTTP.get(url)
      true if response.is_a?(Net::HTTPSuccess)   
    rescue StandardError => error
      record.errors[:path] << 'Web address is invalid'
      false
    end  
  end
end

I am validating the path attribute of my model by using record.path. I am also pushing the error to the respective attribute name by using record.errors[:path].

You can simply replace this with any attribute name.

Then on, I simply call the custom validator in my model.

class Url < ApplicationRecord

  # validations
  validates_presence_of :path
  validates_with UrlValidator

end
Noman Ur Rehman
  • 6,707
  • 3
  • 24
  • 39
0

You could use regex for this, for me works good this one:

(^|[\s.:;?\-\]<\(])(ftp|https?:\/\/[-\w;\/?:@&=+$\|\_.!~*\|'()\[\]%#,]+[\w\/#](\(\))?)(?=$|[\s',\|\(\).:;?\-\[\]>\)])
spirito_libero
  • 1,206
  • 2
  • 13
  • 21
0

URI::regexp(%w[http https]) is obsolete and should not be used.

Instead, use URI::DEFAULT_PARSER.make_regexp(%w[http https])

Rajkaran Mishra
  • 4,532
  • 2
  • 36
  • 61
yld
  • 11
  • 1
  • this answers should be just a comment to the answer you refer too, but I know you cannot give comment this is site meta problemo – buncis May 17 '23 at 10:15
0

Keep it simple:

validates :url, format: %r{http(s)://.+}
Kirill Platonov
  • 401
  • 3
  • 6
0

If you want to validate HTTPS you can use:

require "uri"

class HttpsUrlValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    unless valid_url?(value)
      record.errors[attribute] << "is not a valid URL"
    end
  end

  private

  def valid_url?(url)
    uri = URI.parse(url)
    uri.is_a?(URI::HTTPS) && !uri.host.nil?
  rescue URI::InvalidURIError
    false
  end
end

Usage in the model like this:

validates :website_url, presence: true, https_url: true
Leticia Esperon
  • 2,499
  • 1
  • 18
  • 40