regexp and rails validations

Question

I have two customs validations :

  def validate_email
    regexp = "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]+"
    if sleep_email.present? && !sleep_email.match(regexp)
      errors.add(:sleep_email, "l'email indiqué semble ne pas avoir le bon format")
    end
  end

  def validate_website
    regexp = "(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?"
    if website.present? && !website.match(regexp)
      errors.add(:website, "l'url de votre site web doit avoir la forme de http://votresite.com")
    end
  end

But yo@yo and http://website are valids. What's wrong ?

Does this answer your question? [How to validate an email address using a regular expression?](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression) — ggorlen, Feb 15 '20 at 19:58
I don't understand why with rubular.com : http://website don't match with the regexp and yoyo@yo don't match with the regexp too. In my app the both match — Ben, Feb 15 '20 at 20:40

Schwern · Accepted Answer · 2020-02-15T20:47:59.440

You're building regexes using strings. Strings and regexes have different quoting. You're effectively double escaping. Things like \. are turned into a plain ..

# This results in the regex /a.c/
p "abc".match?("a\.c")  # true

# This results in the desired regex /a\.c/
p "abc".match?("a\\.c")  # true

# This avoids the string escaping entirely.
p "abc".match?(%r{a\.c})  # false

To avoid this double escaping, use /.../ or %r{...} to create regexes.

Don't try to validate email with a regex. Instead, use the validates_email_format_of gem which provides a proper validator you can use on any attribute.

validates :sleep_email, presence: true, email_format: true

If you want to see how to fully validate an email address, look at the source.

Your URL regex does work.

regexp = "(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?"
p "http://website".match?(regexp)  # true

http://website is valid URL syntax. It's not URL's job to check the validity of the host.

If you also want to validate parts of the URL your regex will get increasingly complex. Instead, parse the URL with URI and then check its individual pieces as you like.

Here's a custom validator I whipped up which parse the URI, checks it's an allowed scheme, and does a very rudimentary check on the host.

class UrlValidator < ActiveModel::EachValidator
  ALLOWED_SCHEMES = ['http', 'https']

  private def allowed_schemes
    options[:allowed_schemes] || ALLOWED_SCHEMES
  end

  def validates_each(record, attribute, value)
    uri = URI(value)

    if !allowed_schemes.include?(uri.scheme)
      record.errors.add(attribute, :scheme_not_allowed, message: "Scheme #{uri.scheme} is not allowed")
    end

    # Has to have at least xxx.yyy
    # This is a pretty sloppy host check.
    if !uri.host.match?(/\w+\.\w+/)
      record.errors.add(attribute, :host_not_allowed, message: "Host #{uri.host} is not allowed")
    end
  rescue URI::Error
    record.errors.add(attribute, :not_a_uri)
  end
end

validates :website, url: true

If you wanted to allow other schemes, like ftp...

validates :website, url: { allowed_schemes: ['http', 'https', 'ftp'] }

If you wanted true domain validation, you could add a DNS lookup.

  begin
    Resolv::DNS.open do |dns|
      dns.getaddress(uri.host) }
    end
  rescue Resolv::ResolvError
    record.errors.add(attribute, :invalid_host, { message: "#{uri.host} could not be resolved" }
  end

However, this lookup has a performance impact.

With rubular.com : `http://website` don't match with the regexp `yoyo@yo` don't match with the regexp Why, it's not the same with my app ? — Ben, Feb 15 '20 at 20:38
@Ben You're using strings to build your regexes. Strings and regexes have different rules about metacharacters. Use `/.../` instead. I'll add detail into the answer. — Schwern, Feb 15 '20 at 20:42

score 1 · Answer 2 · answered Feb 15 '20 at 20:22

The standard email regex (RFC 5322 Official Standard) to use is:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

As for the website URL, use this one. The URL will only be valid if the TLD (.com, .net, etc.) is included.

^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$

regexp and rails validations

2 Answers2