8

In order to validate email address, we are relaying on MailAddress Class. However, this email a@bbb..com address seems to be valid according to MailAddress class.

MSDN states that this are valid email address:

The MailAddress class supports the following mail address formats:

  • A simple address format of user@host. If a DisplayName is not set, this is the mail address format generated.
  • A standard quoted display name format of "display name" . If a DisplayName is set, this is the format generated.
  • Angle brackets are added around the User name, Host name for "display name" user@host if these are not included.
  • Quotes are added around the DisplayName for display name , if these are not included.
  • Unicode characters are supported in the DisplayName. property.
  • A User name with quotes. For example, "user name"@host.
  • Consecutive and trailing dots in user names. For example, user...name..@host.
  • Bracketed domain literals. For example, .
  • Comments. For example, (comment)"display name"(comment)<(comment)user(comment)@(comment)domain(comment)>(comment). Comments are removed before transmission.

Taken from https://msdn.microsoft.com/en-us/library/system.net.mail.mailaddress%28v=vs.110%29.aspx.

Note that 7 bullet is close to this problem, but it says that the consecutive dots can appear in the username not in the domain.

Other resources like http://isemail.info (http://isemail.info/a@bbb..com) states that this is not a valid email address.

What do you think it should be the correct behaviour?. Here is a poc.

//C# Example
var emailAddress = @"a@bbb..com";

Func<string,bool> validEmail = (email)=>
{
    try
    {
      var address = new System.Net.Mail.MailAddress(email);
      return true;      
    }catch (Exception ex)
    {
        return false;
    }
};

Assert.IsTrue(validEmail(emailAddress));
//using NUnit.Framework
//O2Ref:nunit.framework.dll
Michael Hidalgo
  • 197
  • 3
  • 13
  • 4
    parsing email addresses is hard. – Daniel A. White Apr 07 '15 at 23:56
  • @DanielA.White absolutely :) – Michael Hidalgo Apr 07 '15 at 23:57
  • why not use a regexp? – Zee Apr 07 '15 at 23:59
  • Well maybe I asked the wrong question :), the question should be is “a@bbb..com” a valid email address :) – Michael Hidalgo Apr 07 '15 at 23:59
  • Thanks @GrantWinney, that's helpful. – Michael Hidalgo Apr 08 '15 at 00:03
  • 5
    This problem has been solved from a UX perspective by not validating the email address very strictly, but instead sending a confirmation email to the address. Perhaps that approach is useful for you? – Keith Payne Apr 08 '15 at 00:54
  • thanks @KeithPayne, yeah that might work. In my case the validation is done in the backend – Michael Hidalgo Apr 08 '15 at 01:06
  • 1
    @MichaelHidalgo There is no email-validation possible without a confirmation mail. The only thing you ever could validate in the backend is, if the email has a valid *format*. While `a@a` is a valid format - there is strong evidence that this email won't pass an validation-email check. – dognose Apr 08 '15 at 17:08
  • That's a good point @dognose. I think this approach should work fine. I will rely on System.Net.Mail.MailAddress. Thanks for your feedback. – Michael Hidalgo Apr 08 '15 at 17:30
  • Maybe this MSDN article [How to: Verify That Strings Are in Valid E-Mail Format](https://msdn.microsoft.com/en-us/library/vstudio/01escwtf(v=vs.100).aspx) could also be helpful (one of the examples in the article: `Invalid: js@proseware..com`). – keenthinker May 01 '15 at 12:41
  • @Zee: Because the syntax described in the RFC isn't a regular language. See [Using a regular expression to validate an email address](http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address). – Daniel Pryden May 11 '15 at 22:10

1 Answers1

2

I think (my personal interpretation of RFC822 with help of this document https://www.cs.tut.fi/~jkorpela/rfc/822addr.html) the address

a@bbb..com

is NOT valid according to RFC 822 especially its LEXICAL TOKENS definition

where you have the domain part of the address defined as

domain      =  sub-domain *("." sub-domain)

sub-domain  =  domain-ref / domain-literal

domain-ref  =  atom

atom        =  1*<any CHAR except specials, SPACE and CTLs>

specials    =  "(" / ")" / "<" / ">" / "@"   ;  Must be in quoted-
             /  "," / ";" / ":" / "\" / <">  ;  string, to use
             /  "." / "[" / "]"              ;  within a word.    

domain-literal =  "[" *(dtext / quoted-pair) "]"

dtext       =  <any CHAR excluding "[",     ; => may be folded
                 "]", "\" & CR, & including
                 linear-white-space>

linear-white-space =  1*([CRLF] LWSP-char)   ; semantics = SPACE
                                             ; CRLF => folding

quoted-pair =  "\" CHAR                      ; may quote any char

CHAR        =  <any ASCII character>         ; (  0-177,  0.-127.)

so the dot character is a special and needs to be in quotes else it is a separator as defined in the 'domain' part.

According to @dkarp:

The "." means it's a literal dot, not another ABNF production. So a domain is generally atoms separated by dots, and atoms are at least one non-specials character in a row.

michalh
  • 2,907
  • 22
  • 29
  • 1
    _"so the dot character is a special and needs to be in quotes else it is a separator as defined in the 'domain' part"_ Nope on this part. The `"."` means it's a literal dot, not another ABNF production. So a `domain` is generally `atom`s separated by dots, and `atom`s are at least one non-`specials` character in a row. – dkarp May 11 '15 at 22:44