1

Possible Duplicates:
Email Validation - Regular Expression
What is the best regular expression for validating email addresses?

Hi All,

I have an email address roughly like this,

firstname.lastname@4domain.co.nz

Which doesn't work with the regex I have here for email addresses. It doesn't seem to like the 4 at the start of the domain.

private const string MatchEmailPattern =
        @"^(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))@" +
        @"((([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\." +
        @"([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])){1}|" +
        @"([a-zA-Z]+[\w-]+\.)+[a-zA-Z]{2,4})$";

Most other corner cases work well with this regex, all of the below are rejected,

        Assert.IsFalse(EmailValidator.IsValidEmailAddress("..@test.com"));
        Assert.IsFalse(EmailValidator.IsValidEmailAddress(".a@test.com"));
        Assert.IsFalse(EmailValidator.IsValidEmailAddress(".@s.dd"));
        Assert.IsFalse(EmailValidator.IsValidEmailAddress("ab@988.120.150.10"));
        Assert.IsFalse(EmailValidator.IsValidEmailAddress("ab@120.256.256.120"));
        Assert.IsFalse(EmailValidator.IsValidEmailAddress("2@bde.cc"));
        Assert.IsFalse(EmailValidator.IsValidEmailAddress("-@bde.cc"));
        Assert.IsFalse(EmailValidator.IsValidEmailAddress("..@bde.cc"));
        Assert.IsFalse(EmailValidator.IsValidEmailAddress("_@bde.cc"));

Any other regexes people can suggest for emails that will work with the above?

Also the above regex has the advantage that it works with addresses like this, and a lot of them don't,

firstname.lastname_@gmail.com

Community
  • 1
  • 1
peter
  • 13,009
  • 22
  • 82
  • 142

4 Answers4

7

You should use the MailAddress class, like this:

try {
    address = new MailAddress(address).Address;
} catch(FormatException) {
    //address is invalid
}

If you use this approach to validate the e-mail address, be aware, that this MailAddress accepts the display name part of the e-mail address as well, and that may be not exactly what you want to achieve. For example, it accepts this strings as valid e-mail addresses:

  • "user1@hotmail.com; user2@gmail.com"
  • "user1@hotmail.com; user2@gmail.com; user3@company.com"
  • "User Display Name user3@company.com"

In these cases only the last part of the strings is parsed as the address, the rest before that is the display name. To get a plain e-mail address without any display name, you should check if the DisplayName property of the MailAddress instance is empty.

bool isValid = false;

try
{
    MailAddress address = new MailAddress(emailAddress);
    isValid = (string.IsNullOrEmpty(address.DisplayName));
    // or
    //isValid = ((address.User + "@" + address.Host) == emailAddress);
}
catch (FormatException)
{
    //address is invalid
}

Furthermore an address having a dot at the end, like "user@company." is accepted by MailAddress either.

pholpar
  • 1,725
  • 2
  • 14
  • 23
SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • Do you just patrol for these questions? :) – Rex Morgan May 16 '11 at 21:26
  • Doesn't handle all the negative cases, but it won't do any harm to relax the rules a bit to allow through my case. Better for my system to send emails to invalid addresses 'occassionally' (if ever) than to stop valid emails being sent. – peter May 16 '11 at 21:33
  • 1
    @Rex: Actually, no. Maybe I should. – SLaks May 16 '11 at 21:34
  • @peter: This parser is based on the RFC. It should conform exactly to the official spec. – SLaks May 16 '11 at 21:35
  • 'conform exactly to the official spec' OK, great. Thanks. – peter May 16 '11 at 21:38
  • MailAddress doesn't actually match the spec. – porges May 17 '11 at 01:08
  • @Porges: Really? Can you give details? Have you filed a bug on Connect? – SLaks May 17 '11 at 01:18
  • Here is output from running against some tests, which I gleaned from a old version of Dominic Sayer's code. http://pastebin.com/raw.php?i=ssbCgH1T Some of the tests (e.g. overall maximum length) are from other specs such as SMTP, but even if you disregard these, the canonical examples from RFC5322 (the current address spec) fail. – porges May 17 '11 at 01:28
  • Um, `Phil Haack says so`? What? – SLaks May 17 '11 at 01:45
  • :) That's a comment from the test data. I think it means Dominic got the test case from Phil Haack's article; http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx – porges May 17 '11 at 01:50
  • Be aware, that this approach accepts the display name part of the e-mail address as well, and that may be not exactly what you want to achieve. For example, it accepts this strings as valid e-mail adresses: "user1@hotmail.com; user2@gmail.com" "user1@hotmail.com; user2@gmail.com; user3@company.com" "User Display Name user3@company.com". In these cases only the last part of the strings is parsed as the address, the rest before that is the display name. To get a plain e-mail address without any display name, you should check if the DisplayName property of the MailAddress instance is empty. – pholpar Jan 20 '20 at 13:20
  • Furthermore "user@company." is accepted by MailAddress either. – pholpar Jan 20 '20 at 13:21
3

Honestly? I might be unpopular for saying this, but why not just match

.+@.+

Minimalist but functional for 90% of cases.

bluepnume
  • 16,460
  • 8
  • 38
  • 48
  • Functional for 100% of cases, I believe. It's only 90% if you add `\..+` – SLaks May 16 '11 at 21:23
  • 1
    Yeah, I only meant functional in the sense of absolutely wanting to make sure the address is valid, in order to send emails to it, etc. Although there's always the substantial probability the email address is fake anyway. Hence: over-the-top email validation is redundant. – bluepnume May 16 '11 at 21:25
3

A fun fact is that (unlike in most languages), it is possible to write a 'regex' in C#/.NET which fully matches the RFC5322 spec for email addresses. Here is one I prepared earlier (link shows the construction):

^(?'localPart'((((\((((?'paren'\()|(?'-paren'\))|([\u0021-\u0027\u002a
-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|
([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\n
\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f]))*(?(paren)(?!)))\)
)|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?(([a-zA-Z0-9!#$%&'*+/=?^_`
{|}~-]+)|("(([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?(([\u0021\u0023-\u
005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|\\(
[\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f
\u007f])))*([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?"))((\((((?'paren'\
()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005b\u005d-\u007e]|[\u0001-\u
0008\u000b\u000c\u000e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ 
\t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u
000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)
[ \t]+)+))*?)(\.(((\((((?'paren'\()|(?'-paren'\))|([\u0021-\u0027\u002
a-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])
|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\
n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f]))*(?(paren)(?!)))\
))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?(([a-zA-Z0-9!#$%&'*+/=?^_
`{|}~-]+)|("(([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?(([\u0021\u0023-\
u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|\\
([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001
f\u007f])))*([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?"))((\((((?'paren'
\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005b\u005d-\u007e]|[\u0001-\
u0008\u000b\u000c\u000e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[
\t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u
000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)
[ \t]+)+))*?))*))@(?'domain'((((\((((?'paren'\()|(?'-paren'\))|([\u002
1-\u0027\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\
u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007
e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f]))*(?(
paren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?(([a-zA-Z0-9
!#$%&'*+/=?^_`{|}~-]+)|("(([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?(([\
u0021\u0023-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u00
1f\u007f])|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000
c\u000e-\u001f\u007f])))*([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?"))((
\((((?'paren'\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005b\u005d-\u00
7e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|([ \t]+((\r\n)[ \t
]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008
\u000b\u000c\u000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ 
\t]+)?|((\r\n)[ \t]+)+))*?)(\.(((\((((?'paren'\()|(?'-paren'\))|([\u00
21-\u0027\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-
\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u00
7e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f]))*(?
(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[\t]+)+))*?(([a-zA-Z0-9
!#$%&'*+/=?^_`{|}~-]+)|("(([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?(([\
u0021\u0023-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e-\u00
1f\u007f])|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000
c\u000e-\u001f\u007f])))*([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?"))((
\((((?'paren'\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005b\u005d-\u00
7e]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|([ \t]+((\r\n)[ \t
]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008
\u000b\u000c\u000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ 
\t]+)?|((\r\n)[ \t]+)+))*?))*)|(((\((((?'paren'\()|(?'-paren'\))|([\u0
021-\u0027\u002a-\u005b\u005d-\u007e]|[\u0001-\u0008\u000b\u000c\u000e
-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)|\\([\u0021-\u0
07e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f]))*(
?(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+))*?\[(([ \t]+
((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?([!-Z^-~]|[\u0001-\u0008\u000b\u000c\
u000e-\u001f\u007f]))*([ \t]+((\r\n)[ \t]+)?|((\r\n)[ \t]+)+)?\]((\(((
(?'paren'\()|(?'-paren'\))|([\u0021-\u0027\u002a-\u005b\u005d-\u007e]|
[\u0001-\u0008\u000b\u000c\u000e-\u001f\u007f])|([ \t]+((\r\n)[ \t]+)?
|((\r\n)[ \t]+)+)|\\([\u0021-\u007e]|[ \t]|[\r\n\0]|[\u0001-\u0008\u00
0b\u000c\u000e-\u001f\u007f]))*(?(paren)(?!)))\))|([ \t]+((\r\n)[ \t]+
)?|((\r\n)[ \t]+)+))*?))$

However, note that RFC5322's handling of domain names is more liberal than the actual domain name RFCs, and there are also additional restrictions which apply from various RFCs (e.g. SMTP enforces a maximum length). So things which RFC5322 considers email addresses can still be invalid by other measures.

The acid test is still just: send an email to it with a verification code.

porges
  • 30,133
  • 4
  • 83
  • 114
  • @SLaks: It's shorter than Perl's Mail::Address::RFC822 :P – porges May 17 '11 at 01:49
  • Yes, but it's still **long**. It would be much shorter if you use actual characters rather than `\u` escapes. (but unprintable) – SLaks May 17 '11 at 01:51
  • You could also simplify it quite a bit, probably - e.g. merge character classes which are in `x|y` parts. – porges May 17 '11 at 01:54
0

http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html

If you want to fully implement an e-mail regex, might as well do it right.

</sarcasm>

Given the ludicrous complexity of the e-mail address spec, fully matching compliant addresses while rejecting all non-compliant addresses is rather difficult to do with a regular expression.

The best method of validating an e-mail address is to require simple proper form (meaning, has an @ sign, and after the @ sign, there is atleast a single period) and then just send an e-mail to the address.

Technically speaking, a well-formed address to "example.com" will never be delivered, because "example.com" is a reserved name. Or sending an e-mail to "askjgdaiuyvbcxakjh.com". That domain doesn't exist, but your regex check would return valid, while the simple "send a message, click a link" method would successfully reject all e-mail addresses that you can't contact.

Thebigcheeze
  • 3,408
  • 2
  • 22
  • 18
  • That's not fully compliant, it doesn't handle comments (as the code states), and it's also for RFC822, which is obsolete. – porges May 17 '11 at 00:54
  • You seem to have responded to the first two lines of my answer while disregarding the next 8. In the next 8 it's clear I don't advocate using the (out-dated and incomplete) regex from the linked page. – Thebigcheeze May 17 '11 at 14:55
  • Maybe I should have prefixed it with ("By the way..."). I only meant it as a comment regarding the linked regex, not your comment as a whole. – porges May 17 '11 at 20:09
  • I wonder how often people actually put comments in email addresses they type into web sites? – Matthew Lock Oct 05 '12 at 00:45