5

I'm trying to use a regular expression to validate the format of a URL in my Rails model. I've tested the regex in Rubular with the URL http://trentscott.com and it matched.

Any idea why it fails validation when I test it in my Rails app (it says "name is invalid").

Code:

  url_regex = /^((http|https):\/\/)?[a-z0-9]+([-.]{1}[a-z0-9]+).[a-z]{2,5}(:[0-9]{1,5})?(\/.)?$/ix

  validates :serial, :presence => true
  validates :name, :presence => true,
                   :format    => {  :with => url_regex  }
Michael Chaney
  • 2,911
  • 19
  • 26
Trent Scott
  • 2,018
  • 8
  • 34
  • 50

5 Answers5

14

You don't need to use a regexp here. Ruby has a much more reliable way to do that:

# Use the URI module distributed with Ruby:

require 'uri'

unless (url =~ URI::regexp).nil?
    # Correct URL
end

(this answer comes from this post:)

Community
  • 1
  • 1
Thomas Hupkens
  • 1,570
  • 10
  • 16
10

(I like Thomas Hupkens' answer, but for other people viewing, I'll recommend Addressable)

It's not recommended to use regex to validate URLs.

Use Ruby's URI library or a replacement like Addressable, both of which making URL validation trivial. Unlike URI, Addressable can also handle international characters and tlds.

Example Usage:

require 'addressable/uri'

Addressable::URI.parse("кц.рф") # Works

uri = Addressable::URI.parse("http://example.com/path/to/resource/")
uri.scheme
#=> "http"
uri.host
#=> "example.com"
uri.path
#=> "/path/to/resource/"

And you could build a custom validation like:

class Example
  include ActiveModel::Validations

  ##
  # Validates a URL
  #
  # If the URI library can parse the value, and the scheme is valid
  # then we assume the url is valid
  #
  class UrlValidator < ActiveModel::EachValidator
    def validate_each(record, attribute, value)
      begin
        uri = Addressable::URI.parse(value)

        if !["http","https","ftp"].include?(uri.scheme)
          raise Addressable::URI::InvalidURIError
        end
      rescue Addressable::URI::InvalidURIError
        record.errors[attribute] << "Invalid URL"
      end
    end
  end

  validates :field, :url => true
end

Code Source

Tilo
  • 33,354
  • 5
  • 79
  • 106
danneu
  • 9,244
  • 3
  • 35
  • 63
  • 1
    after looking at addressable, I think it wins hands down, thanks – stephenmurdoch Aug 20 '11 at 20:29
  • +1 for addressable BUT don't assume that it will raise any exceptions because it won't. Addressable::URI.parse will fail silently trying its best to figure out the URI. For example, say you want to validate an incorrect URI such as: http://http://thing.com. Addressable will call the scheme http and the domain http as well since it views the colon as a port delimiter. No error will be raised – onetwopunch Mar 15 '17 at 21:09
7

Your input ( http://trentscott.com) does not have a subdomain but the regex is checking for one.

domain_regex = /^((http|https):\/\/)[a-z0-9]*(\.?[a-z0-9]+)\.[a-z]{2,5}(:[0-9]{1,5})?(\/.)?$/ix

Update

You also don't need the ? after ((http|https):\/\/) unless the protocol is sometimes missing. I've also escaped . because that will match any character. I'm not sure what the grouping above is for, but here is a better version that supports dashes and groups by section

domain_regex = /^((http|https):\/\/) 
(([a-z0-9-\.]*)\.)?                  
([a-z0-9-]+)\.                        
([a-z]{2,5})
(:[0-9]{1,5})?
(\/)?$/ix
cordsen
  • 1,691
  • 12
  • 10
  • Thanks. That fixed the error but now an entry like "abcd" is valid. Any idea on how to fix that? – Trent Scott Jun 03 '11 at 19:06
  • 1
    The update should work. One more thing I removed was the [-.] and replaced it with \. – cordsen Jun 03 '11 at 19:49
  • This does not handle international domain names, which can be represented in ASCII like: www.xn--b1akcweg3a.xn--p1ai. Yes, this gives you double dashes in your domain, which is legal, as well as top-level domains (the right-most component) that are longer than 3 characters. – David Keener May 30 '13 at 17:43
  • @cordsen: what if I want to write a regex in `Ruby` for a `URL` which includes any `non-ASCII` characters or Chinese characters as well? For example, `http://www.詹姆斯.com/` Can you please let me know how to figure this out? – huzefa biyawarwala Oct 04 '19 at 09:23
1

Try this.

It's working for me.

/(ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?/
Sebastián Palma
  • 32,692
  • 6
  • 40
  • 59
0

This will include an international host handling as well like abc.com.it where the .it part is optional

match '/:site', to: 'controller#action' , constraints: { site: /[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}(.[a-zA-Z]{2,63})?/}, via: :get, :format => false
Maged Makled
  • 1,918
  • 22
  • 25