Validation for URL/Domain using Regex? (Rails)

Question

I am trying to create a validation that checks to make sure a domain/url is valid for example "test.com"

def valid_domain_name?
  domain_name = domain.split(".")
  name = /(?:[A-Z0-9\-])+/.match(domain_name[0]).nil?
  tld = /(?:[A-Z]{2}|aero|ag|asia|at|be|biz|ca|cc|cn|com|de|edu|eu|fm|gov|gs|jobs|jp|in|info|me|mil|mobi|museum|ms|name|net|nu|nz|org|tc|tw|tv|uk|us|vg|ws)/.match(domain_name[1]).nil?
  if name == false or tld == false
    errors.add(:domain_name, 'Invalid domain name. Please only use names with letters (A-Z) and numbers (0-9).')
  end
end

This is what I have so far but it doesn't work. It lets bad URLs through without failing.

I don't know regex very well.

what's the deal with this part: '[A-Z]{2}'? are you trying to let any 2 letter domain go through? domains in all caps too? — Victor, Jul 14 '09 at 21:39
This answer is outdated! Use [`URI::regexp`](http://www.ruby-doc.org/stdlib-2.0/libdoc/uri/rdoc/URI.html#method-c-regexp) instead. Supported since [Ruby 1.8.6](http://www.ruby-doc.org/stdlib-1.8.6/libdoc/uri/rdoc/URI.html#method-c-regexp). Example [below](http://stackoverflow.com/a/16931672/712765). — Old Pro, Jun 05 '13 at 04:49
Since the registries are now allowing new TLDs the ability to build a validating regex got harder. You'll need to regularly update from an accurate source and build from that. — the Tin Man, Sep 07 '16 at 00:46

score 65 · Accepted Answer · answered Jul 15 '09 at 03:28

65

Stumbled on this:

validates_format_of :domain_name, :with => /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/ix

FYI: Rubular is a fantastic resource for testing your Ruby regular expressions

answered Jul 15 '09 at 03:28

Tate Johnson

3,910
1
23
21

16

this doesn't handle top level domains that are longer than 5 characters. e.g. .museum - use the following instead - /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,6}(:[0-9]{1,5})?(\/.*)?$/ix – SyntaxGoonoo Oct 14 '11 at 03:18
2

This answer is outdated! Use [`URI::regexp`](http://www.ruby-doc.org/stdlib-2.0/libdoc/uri/rdoc/URI.html#method-c-regexp) instead. Supported since [Ruby 1.8.6](http://www.ruby-doc.org/stdlib-1.8.6/libdoc/uri/rdoc/URI.html#method-c-regexp). Example [below](http://stackoverflow.com/a/16931672/712765). – Old Pro Jun 05 '13 at 04:51
Also doesn't handle domains with more than one consecutive dash, i.e. every IDN domain. – Daniel Rikowski Oct 01 '13 at 17:22
2

This doesn't work with Rails 4 (and shouldn't be used for earlier rails) because it has a security vulnerability since it's using multiline regex – Neal Nov 21 '14 at 19:17
3

Altered the regex so it allows top-level domains up to 63 characters (after reading [this](http://stackoverflow.com/questions/9238640/how-long-can-a-tld-possibly-be)) and so it is not using multiline anchors, which may present a security risk (as read [here](http://guides.rubyonrails.org/security.html#regular-expressions)): `/\A(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,63}(:[0-9]{1,5})?(\/.*)?\z/ix` – the-bass Apr 28 '16 at 10:27

Brian Ray · Answer 2 · 2011-10-05T23:24:09.833

28

@Tate's answer is good for a full URL, but if you want to validate a domain column, you don't want to allow the extra URL bits his regex allows (e.g. you definitely don't want to allow a URL with a path to a file).

So I removed the protocol, port, file path, and query string parts of the regex, resulting in this:

^[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}$

Check out the same test cases for both versions.

Original (allows domains or full URLs): http://rubular.com/r/qGInC06jcz
Modified (allows only domains): http://rubular.com/r/yP6dHFEhrl

edited Oct 05 '11 at 23:24

answered Oct 05 '11 at 23:14

Brian Ray

1,256
1
13
19

2

I modified it slightly to allow ip addresses and localhost for my own use: ^[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.?[a-z0-9]{2,5}$ – JosephL Jul 05 '12 at 03:51
@n00b see the answer below – Jez Mar 20 '13 at 16:02

Jez · Answer 3 · 2013-02-15T17:33:44.513

16

^(http|https):\/\/|[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,6}(:[0-9]{1,5})?(\/.*)?$/ix

example.com
sub.example.com
sub.domain.my-example.com
example.com/?stuff=true
example.com:5000/?stuff=true
sub.domain.my-example.com/path/to/file/hello.html
hello.museum
http://railsgirls.com

http://rubular.com/r/cdkLxAkTbk

Added optional http:// or https://

The longest TLD is .museum, which has 6 characters...

edited Feb 15 '13 at 17:33

answered Nov 09 '12 at 16:24

Jez

305
2
12

1

I threw in a few negative examples in your lists and the ones beginning with http:// got matched because of the brackets (eg: `http://fake`. Basically the four bracket groups in your expression currently allow for partial match. It doesn't look really pretty but I fixed it by wrapping the whole expression in brackets and making http optional: `^(((http|https):\/\/|)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,6}(:[0-9]{1,5})?(\/.*)?)$` – ben Jan 09 '14 at 10:28
Nice regex, thanks. You could remove `{1}` though, it's useless. – pmrotule Aug 09 '17 at 05:39

score 13 · Answer 4 · edited Sep 07 '16 at 00:48

13

Another way to do URL validation in Rails is

validates :web_address, :format => { :with => URI::regexp(%w(http https)), :message => "Valid URL required"}

edited Sep 07 '16 at 00:48

the Tin Man

158,662
42
215
303

answered May 15 '13 at 11:41

vitthal-gaikwad

1,184
11
13

Ran into a problem with this method. Apparently `https://####.com` validates, but is very much not a valid URL. `URI.regexp(%w[http https]).match('https://####.com')` results in `#` – Omnilord Dec 21 '18 at 22:50

score 7 · Answer 5 · edited Oct 04 '21 at 13:20

7

Better answer since Ruby 1.8.6

require 'uri'

def valid_url?(url)
  url.slice(URI::regexp(%w(http https))) == url
end

edited Oct 04 '21 at 13:20

Sebastián Palma

32,692
6
40
59

answered Jun 05 '13 at 04:39

Old Pro

24,624
7
58
106

6

This only validates that http is present, `http://fake` will pass while `www.example.com` won't. – ben Jan 09 '14 at 10:12
2

It's not exactly clear what the OP wants, so I provided a test for a valid URL. http://localhost is a valid URL which I use often. www.example.com is not a valid URL. The real test of a URLs validity is to see if an HTTP(S) client can connect to it. – Old Pro Jan 09 '14 at 22:24
This generates a regex that finds URLs in a given string. i.e. `"blah blah http://test.com/ blah blah"` will validate, which is probably not what OP wants. – aidan Nov 19 '18 at 05:12
Using `REGEX === test_string` returns a boolean, which I generally prefer to the `string =~ regex` (or `regex =~ string`) forms, which return an integer or `nil`. It certainly makes sense in this context because this is in a method with a name that implies it'll be returning `true` or `false`. – aidan Nov 19 '18 at 05:17
@aidan thanks for the comments. I fixed the answer accordingly. – Old Pro Nov 20 '18 at 06:39

score 3 · Answer 6 · edited Jun 17 '19 at 04:17

Here is the regex used by henrik's validates_url_format_of Rails validator:

REGEXP = %r{
  \A
  https?://                                                          # http:// or https://
  ([^\s:@]+:[^\s:@]*@)?                                              # optional username:pw@
  ( ((#{ALNUM}+\.)*xn---*)?#{ALNUM}+([-.]#{ALNUM}+)*\.[a-z]{2,6}\.? |  # domain (including Punycode/IDN)...
      #{IPv4_PART}(\.#{IPv4_PART}){3} )                              # or IPv4
  (:\d{1,5})?                                                        # optional port
  ([/?]\S*)?                                                         # optional /whatever or ?whatever
  \Z
}iux

Rather than post a link, which can rot then break, copy the important part into your answer. and give it the appropriate credit. — the Tin Man, Sep 07 '16 at 00:50

score 3 · Answer 7 · answered Jul 01 '16 at 06:14

3

What works for me is

def validate_url(text)
  uri = URI.parse(text)
  raise URI::InvalidURIError unless uri.kind_of?(URI::HTTP) || uri.kind_of?(URI::HTTPS)
  rescue URI::InvalidURIError
    errors.add(:url, 'is invalid')
  end
end

answered Jul 01 '16 at 06:14

Amit Patel

15,609
18
68
106

score 2 · Answer 8 · edited Sep 07 '16 at 00:48

2

I took what you had and modified it so that I could make the http:// or https:// optional:

/^((http|https):\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/ix

edited Sep 07 '16 at 00:48

the Tin Man

158,662
42
215
303

answered Aug 18 '10 at 21:03

kirk

41
2

1

Misses a lot of TLDs e.g. .co.uk .co.th ac.uk etc. – Ashley Jun 08 '12 at 21:37

score 2 · Answer 9 · edited Sep 07 '16 at 00:53

2

Using Brian Ray's answer above which I think answers the question (domain not url) and updating it for Rails 4.

/\A[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}\z/ix

edited Sep 07 '16 at 00:53

the Tin Man

158,662
42
215
303

answered Aug 23 '16 at 10:36

David Kobia

300
2
8

score 1 · Answer 10 · answered Jul 15 '09 at 03:00

According to google, this one works nicely:

/^([a-z0-9]([-a-z0-9]*[a-z0-9])?\\.)+((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])$/i

A bit lengthy...

It's case-insensitive...it doesn't look like your regexes are, but I don't know Ruby. Or maybe you capitalized them earlier.

score 0 · Answer 11 · answered Jul 14 '09 at 21:32

0

Try adjusting the pattern so that they start with ^ (the "starts with" character), and ends with $ ("ends with"), so that the whole pattern reads "a string that starts with this and then ends", otherwise the match for name, say, will be a positive match if the pattern is found at all (i.e. has one single correct character.)

answered Jul 14 '09 at 21:32

David Hedlund

128,221
31
203
222

1

^ and $ are start/end of LINE, you most likely want start/end of STRING which is \A and \Z – Pascal Feb 25 '13 at 13:11

score 0 · Answer 12 · edited May 23 '17 at 10:31

0

^(([a-zA-Z]{1})|([a-zA-Z]{1}[a-zA-Z]{1})|([a-zA-Z]{1}[0-9]{1})|([0-9]{1}[a-zA-Z]{1})|([a-zA-Z0-9][a-zA-Z0-9-_]{1,61}[a-zA-Z0-9]))\.([a-zA-Z]{2,6}|[a-zA-Z0-9-]{2,30}\.[a-zA-Z]{2,3})$

Domain name validation with RegEx

edited May 23 '17 at 10:31

Community

1
1

answered Jan 15 '14 at 09:14

paka

1,601
22
35

This returns false on a valid url. – Linus Mar 30 '14 at 12:15
@LinusAn can you please specify an example? – paka Mar 31 '14 at 16:13

score -2 · Answer 13 · edited Sep 07 '16 at 00:52

-2

This is my URL validator, using Ruby's built-in parser

class UrlValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    p = URI::Parser.new

    valid = begin
      p.parse(value)
      true
    rescue
      false
    end

    unless valid
      record.errors[attribute] << (options[:message] || "is an invalid URL")
    end
  end

end

edited Sep 07 '16 at 00:52

the Tin Man

158,662
42
215
303

answered Aug 09 '15 at 17:00

Jan Gerritsen

197
2
9

Validation for URL/Domain using Regex? (Rails)

13 Answers13

Linked