1

How can I properly validate a subdomain format?

Here's what I've got:

  validates :subdomain, uniqueness: true, case_sensitive: false
  validates :subdomain, format: { with: /\A[A-Za-z0-9-]+\z/, message: "not a valid subdomain" }
  validates :subdomain, exclusion: { in: %w(support blog billing help api www host admin en ru pl ua us), message: "%{value} is reserved." }
  validates :subdomain, length: { maximum: 20 }
  before_validation :downcase_subdomain
  protected
    def downcase_subdomain
      self.subdomain.downcase! if attribute_present?("subdomain")
    end  

Question:

Is there a standard REGEX subdomain validation like there is for email? What is the best REGEX for subdomain to use?

validates :email, format: { with: URI::MailTo::EMAIL_REGEXP }, allow_blank: true

anothermh
  • 9,815
  • 3
  • 33
  • 52
Yshmarov
  • 3,450
  • 1
  • 22
  • 41

1 Answers1

12

RFC 1035 defines subdomain syntax like so:

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9

And a merciful human readable description.

[Labels] must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less.

We can do most of this with a regex, and the length restriction separately.

validates :subdomain, format: {
  with: %r{\A[a-z](?:[a-z0-9-]*[a-z0-9])?\z}i, message: "not a valid subdomain"
}, length: { in: 1..63 }

Pulling that regex into pieces to explain it.

%r{
  \A
  [a-z]                       # must start with a letter
  (?:
    [a-z0-9-]*                # might contain alpha-numerics or a dash
    [a-z0-9]                  # must end with a letter or digit
  )?                          # that's all optional
 \z
}ix

We might be tempted to use the simpler /\A[a-z][a-z0-9-]*[a-z0-9]?\z/i but this allows foo-.

See also Regexp for subdomain.

Community
  • 1
  • 1
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • fantastic explanation and a special thank you for the regex explanation in the end! – Yshmarov Feb 17 '20 at 23:32
  • 1
    @Yshmarov You're welcome. I'd just written something similar and this caused me to realized I had a bug, I'd forgotten about dashes. Domain validation comes up so often it's prompted me to work on [`validates_hostname`](https://github.com/KimNorgaard/validates_hostname) and see about adding a subdomain/label validator. – Schwern Feb 18 '20 at 00:36