rails email validation format and regex

Question

Currently following the Michael Hartl rails tutorial

Given the following tests in rails

  test "email validation should accept valid addresses" do
    valid_addresses = %w[user@example.com USER@foo.COM A_US-ER@foo.bar.org
                         first.last@foo.jp alice+bob@baz.cn]
    valid_addresses.each do |valid_address|
      @user.email = valid_address
      assert @user.valid?, "#{valid_address.inspect} should be valid"
    end
  end

  test "email validation should reject invalid addresses" do
    invalid_addresses = %w[user@example,com user_at_foo.org user.name@example.
                           foo@bar_baz.com foo@bar+baz.com]
    invalid_addresses.each do |invalid_address|
      @user.email = invalid_address
      assert_not @user.valid?, "#{invalid_address.inspect} should be invalid"
    end
  end

and the following regex for email format validation

VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i
validates :email, presence: true, format: { with: VALID_EMAIL_REGEX }

Can someone explain to me what the tests are testing with respect to the regex? Why are the valid tests only user@example.com, USER@foo.COM, and so on. What if i add another element to valid_addresses that's USER@EXAMPLE.COM. Why did Michael specifically choose the above 5 example emails as valid_addresses and 5 invalid_addresses?

If the regex tests for all formats and only returns a specific one, why do we need to test at all?

Not really sure what your question is..the tests make sure invalid emails are invalid and valid emails are valid using the supplied regular expression. If you are wondering why you should unit test, [check out this question](http://stackoverflow.com/questions/67299/is-unit-testing-worth-the-effort). — Sam, Oct 17 '14 at 15:22
I understand that my questions is very confusing, so I guess I just want to know what the tests are testing for in this case, on a line by line basis. — user3277633, Oct 17 '14 at 15:28
The tests that Michael proposes are not exhaustive. He picks a few examples of valid email addresses and a few of invalid email addresses to make sure that the regex will capture those cases. He is trying to go for typical errors in typing for the invalid addresses, as well as unusual combinations that are still valid. We test to make sure that there wasn't a typo on the Regex we entered in the validation section. — Samantha Cabral, Oct 17 '14 at 15:51

score 2 · Accepted Answer · edited May 23 '17 at 11:45

Let us break down the expression (keep in mind the i modifier makes it case insensitive):

\A          (?# anchor to the beginning of the string)
[\w+\-.]+   (?# match 1+ a-z, A-Z, 0-9, +, _, -, or .)
@           (?# match literal @)
[a-z\d\-.]+ (?# match 1+ a-z, 0-9, -, or .)
\.          (?# match literal .)
[a-z]+      (?# match 1+ a-z)
\z          (?# anchor to the absolute end of the string)

This is what the tutorial defines as an email (in reality, it's much more complicated). So the author, Michael Hartl, wrote a couple tests for "valid" and "invalid" (according to the above definitions) emails.

Pretty much the "user" can be alphanumeric or contain _+-.. The "domain" can be alphanumeric or -.. And the "TLD" can only be letters. The first 5 emails use many variations of these previous rules as "acceptable" emails. The last 5 emails fail for the following reasons:

user@example,com - , can't be matched
user_at_foo.org - no @
user.name@example. - no TLD after .
foo@bar_baz.com - domain can't contain _
foo@bar+baz.com - domain can't contain +

Obviously if you want more specific emails to match (or not match) add them to the array of tests. If your test fails, you know you will need to update your expression :)

@user3277633 I kind of skimmed over the point in my answer, but I wouldn't use the above expression as a final solution. It is very loose and will contain tons of false-positives as well as deny some valid emails. Some false-positives: [TLDs](http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains) need to be at least to characters, (sub)domains cannot start/end with a period, etc. [Check this monster RegEx](https://www.debuggex.com/r/aH_x42NflV8G-GS7) (note, I don't expect you to use this necessarily). — Sam, Oct 17 '14 at 15:59

score 1 · Answer 2 · answered Mar 09 '23 at 19:42

1

For everyone checking in, in 2023.

You can use:

validates :email, format: { with: URI::MailTo::EMAIL_REGEXP }

answered Mar 09 '23 at 19:42

S.Klatt

21
5

score 0 · Answer 3 · answered Oct 17 '14 at 16:00

I think the best way of trying to get accustomed with regular expressions is to experiment with different regular expressions. If you try to use Rubular.com (like recommended in the book) and paste: \A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z in the regular expression part. The letter i comes in the the text box following the regular expression. Then if you paste email address in the test string part: user@example,com you'll notice that the email address does not match, but if you replace the comma with a dot, then it'll match. The 2nd incorrect email address just tests that the character @ is included (which is missing in this case).

3rd incorrect email address tests that the suffix contains 1 or more letters. 4th incorrect email address tests that there are no underscores after @ in the email address. 5th incorrect email address tests that there isn't + character after @ in the email address.

The correct email addresses basically test the same things, but in those email addresses underscores and plus signs are in the right part of the email address. It also tests that USER@foo.COM email address is saved in the the User model lower case: before_save { self.email = email.downcase } If that did not happen, it would not be a valid email address in the test.

rails email validation format and regex

3 Answers3