34

I am trying to create a validation that checks to make sure a domain/url is valid for example "test.com"

def valid_domain_name?
  domain_name = domain.split(".")
  name = /(?:[A-Z0-9\-])+/.match(domain_name[0]).nil?
  tld = /(?:[A-Z]{2}|aero|ag|asia|at|be|biz|ca|cc|cn|com|de|edu|eu|fm|gov|gs|jobs|jp|in|info|me|mil|mobi|museum|ms|name|net|nu|nz|org|tc|tw|tv|uk|us|vg|ws)/.match(domain_name[1]).nil?
  if name == false or tld == false
    errors.add(:domain_name, 'Invalid domain name. Please only use names with letters (A-Z) and numbers (0-9).')
  end
end

This is what I have so far but it doesn't work. It lets bad URLs through without failing.

I don't know regex very well.

kush
  • 16,408
  • 17
  • 48
  • 65
  • what's the deal with this part: '[A-Z]{2}'? are you trying to let any 2 letter domain go through? domains in all caps too? – Victor Jul 14 '09 at 21:39
  • This answer is outdated! Use [`URI::regexp`](http://www.ruby-doc.org/stdlib-2.0/libdoc/uri/rdoc/URI.html#method-c-regexp) instead. Supported since [Ruby 1.8.6](http://www.ruby-doc.org/stdlib-1.8.6/libdoc/uri/rdoc/URI.html#method-c-regexp). Example [below](http://stackoverflow.com/a/16931672/712765). – Old Pro Jun 05 '13 at 04:49
  • Since the registries are now allowing new TLDs the ability to build a validating regex got harder. You'll need to regularly update from an accurate source and build from that. – the Tin Man Sep 07 '16 at 00:46

13 Answers13

65

Stumbled on this:

validates_format_of :domain_name, :with => /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/ix

FYI: Rubular is a fantastic resource for testing your Ruby regular expressions

Tate Johnson
  • 3,910
  • 1
  • 23
  • 21
  • 16
    this doesn't handle top level domains that are longer than 5 characters. e.g. .museum - use the following instead - /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,6}(:[0-9]{1,5})?(\/.*)?$/ix – SyntaxGoonoo Oct 14 '11 at 03:18
  • 2
    This answer is outdated! Use [`URI::regexp`](http://www.ruby-doc.org/stdlib-2.0/libdoc/uri/rdoc/URI.html#method-c-regexp) instead. Supported since [Ruby 1.8.6](http://www.ruby-doc.org/stdlib-1.8.6/libdoc/uri/rdoc/URI.html#method-c-regexp). Example [below](http://stackoverflow.com/a/16931672/712765). – Old Pro Jun 05 '13 at 04:51
  • Also doesn't handle domains with more than one consecutive dash, i.e. every IDN domain. – Daniel Rikowski Oct 01 '13 at 17:22
  • 2
    This doesn't work with Rails 4 (and shouldn't be used for earlier rails) because it has a security vulnerability since it's using multiline regex – Neal Nov 21 '14 at 19:17
  • 3
    Altered the regex so it allows top-level domains up to 63 characters (after reading [this](http://stackoverflow.com/questions/9238640/how-long-can-a-tld-possibly-be)) and so it is not using multiline anchors, which may present a security risk (as read [here](http://guides.rubyonrails.org/security.html#regular-expressions)): `/\A(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,63}(:[0-9]{1,5})?(\/.*)?\z/ix` – the-bass Apr 28 '16 at 10:27
28

@Tate's answer is good for a full URL, but if you want to validate a domain column, you don't want to allow the extra URL bits his regex allows (e.g. you definitely don't want to allow a URL with a path to a file).

So I removed the protocol, port, file path, and query string parts of the regex, resulting in this:

^[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}$


Check out the same test cases for both versions.

Brian Ray
  • 1,256
  • 1
  • 13
  • 19
16
^(http|https):\/\/|[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,6}(:[0-9]{1,5})?(\/.*)?$/ix
  • example.com
  • sub.example.com
  • sub.domain.my-example.com
  • example.com/?stuff=true
  • example.com:5000/?stuff=true
  • sub.domain.my-example.com/path/to/file/hello.html
  • hello.museum
  • http://railsgirls.com

http://rubular.com/r/cdkLxAkTbk

Added optional http:// or https://

The longest TLD is .museum, which has 6 characters...

Jez
  • 305
  • 2
  • 12
  • 1
    I threw in a few negative examples in your lists and the ones beginning with http:// got matched because of the brackets (eg: `http://fake`. Basically the four bracket groups in your expression currently allow for partial match. It doesn't look really pretty but I fixed it by wrapping the whole expression in brackets and making http optional: `^(((http|https):\/\/|)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,6}(:[0-9]{1,5})?(\/.*)?)$` – ben Jan 09 '14 at 10:28
  • Nice regex, thanks. You could remove `{1}` though, it's useless. – pmrotule Aug 09 '17 at 05:39
13

Another way to do URL validation in Rails is

validates :web_address, :format => { :with => URI::regexp(%w(http https)), :message => "Valid URL required"}
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
vitthal-gaikwad
  • 1,184
  • 11
  • 13
  • Ran into a problem with this method. Apparently `https://####.com` validates, but is very much not a valid URL. `URI.regexp(%w[http https]).match('https://####.com')` results in `#` – Omnilord Dec 21 '18 at 22:50
7

Better answer since Ruby 1.8.6

require 'uri'

def valid_url?(url)
  url.slice(URI::regexp(%w(http https))) == url
end
Sebastián Palma
  • 32,692
  • 6
  • 40
  • 59
Old Pro
  • 24,624
  • 7
  • 58
  • 106
  • 6
    This only validates that http is present, `http://fake` will pass while `www.example.com` won't. – ben Jan 09 '14 at 10:12
  • 2
    It's not exactly clear what the OP wants, so I provided a test for a valid URL. http://localhost is a valid URL which I use often. www.example.com is not a valid URL. The real test of a URLs validity is to see if an HTTP(S) client can connect to it. – Old Pro Jan 09 '14 at 22:24
  • This generates a regex that finds URLs in a given string. i.e. `"blah blah http://test.com/ blah blah"` will validate, which is probably not what OP wants. – aidan Nov 19 '18 at 05:12
  • Using `REGEX === test_string` returns a boolean, which I generally prefer to the `string =~ regex` (or `regex =~ string`) forms, which return an integer or `nil`. It certainly makes sense in this context because this is in a method with a name that implies it'll be returning `true` or `false`. – aidan Nov 19 '18 at 05:17
  • @aidan thanks for the comments. I fixed the answer accordingly. – Old Pro Nov 20 '18 at 06:39
3

Here is the regex used by henrik's validates_url_format_of Rails validator:

REGEXP = %r{
  \A
  https?://                                                          # http:// or https://
  ([^\s:@]+:[^\s:@]*@)?                                              # optional username:pw@
  ( ((#{ALNUM}+\.)*xn---*)?#{ALNUM}+([-.]#{ALNUM}+)*\.[a-z]{2,6}\.? |  # domain (including Punycode/IDN)...
      #{IPv4_PART}(\.#{IPv4_PART}){3} )                              # or IPv4
  (:\d{1,5})?                                                        # optional port
  ([/?]\S*)?                                                         # optional /whatever or ?whatever
  \Z
}iux
Shane
  • 1,015
  • 2
  • 12
  • 31
Trevor Turk
  • 465
  • 5
  • 10
  • 1
    Rather than post a link, which can rot then break, copy the important part into your answer. and give it the appropriate credit. – the Tin Man Sep 07 '16 at 00:50
3

What works for me is

def validate_url(text)
  uri = URI.parse(text)
  raise URI::InvalidURIError unless uri.kind_of?(URI::HTTP) || uri.kind_of?(URI::HTTPS)
  rescue URI::InvalidURIError
    errors.add(:url, 'is invalid')
  end
end
Amit Patel
  • 15,609
  • 18
  • 68
  • 106
2

I took what you had and modified it so that I could make the http:// or https:// optional:

/^((http|https):\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/ix
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
kirk
  • 41
  • 2
2

Using Brian Ray's answer above which I think answers the question (domain not url) and updating it for Rails 4.

/\A[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}\z/ix
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
David Kobia
  • 300
  • 2
  • 8
1

According to google, this one works nicely:

/^([a-z0-9]([-a-z0-9]*[a-z0-9])?\\.)+((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])$/i

A bit lengthy...

It's case-insensitive...it doesn't look like your regexes are, but I don't know Ruby. Or maybe you capitalized them earlier.

Dan Breen
  • 12,626
  • 4
  • 38
  • 49
0

Try adjusting the pattern so that they start with ^ (the "starts with" character), and ends with $ ("ends with"), so that the whole pattern reads "a string that starts with this and then ends", otherwise the match for name, say, will be a positive match if the pattern is found at all (i.e. has one single correct character.)

David Hedlund
  • 128,221
  • 31
  • 203
  • 222
  • 1
    ^ and $ are start/end of LINE, you most likely want start/end of STRING which is \A and \Z – Pascal Feb 25 '13 at 13:11
0
^(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|([a-zA-Z0-9][a-zA-Z0-9-_]{1,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,6}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})$

Domain name validation with RegEx

Community
  • 1
  • 1
paka
  • 1,601
  • 22
  • 35
-2

This is my URL validator, using Ruby's built-in parser

class UrlValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    p = URI::Parser.new

    valid = begin
      p.parse(value)
      true
    rescue
      false
    end

    unless valid
      record.errors[attribute] << (options[:message] || "is an invalid URL")
    end
  end

end
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Jan Gerritsen
  • 197
  • 2
  • 9