5

In validating email addresses I have tried using both the EmailAddressAttribute class from System.ComponentModel.DataAnnotations:

[EmailAddress(ErrorMessage = "Invalid Email Address")]
public string Email { get; set; }

and the MailAddress class from System.Net.Mail by doing:

bool IsValidEmail(string email)
{
    try {
        var addr = new System.Net.Mail.MailAddress(email);
        return addr.Address == email;
    }
    catch {
        return false;
    }
}

as suggested in C# code to validate email address. Both methods work in principle, they catch invalid email addresses like, e.g., user@, not fulfilling the format user@host.

My problem is that none of the two methods detect invalid characters in the user field, such as æ, ø, or å (e.g. åge@gmail.com). Is there any reason for why such characters are not returning a validation error? And do anybody have a elegant solution on how to incorporate a validation for invalid characters in the user field?

Community
  • 1
  • 1
hejto
  • 53
  • 1
  • 3
  • Why are you trying to apply this logic to your input form? Any email validation that must work, must simply send a mail to the specified address. You don't want to generate a false negative for a current or future case that you didn't foresee. – CodeCaster May 25 '16 at 10:30
  • Write a custom attribute and use email regex. – Nikhil Vartak May 25 '16 at 10:32
  • 1
    @Think no, don't suggest horrific practices like that. – CodeCaster May 25 '16 at 10:32
  • 2
    @Think2ceCode1ce No, please NO! Don't use regex to parse email addresses, they are all wrong, all of them! – Manfred Radlwimmer May 25 '16 at 10:32
  • @CodeCaster Why? People have been doing it since ages. – Nikhil Vartak May 26 '16 at 03:26
  • @ManfredRadlwimmer Can you prove? There is at least one legitimate regex that works, I am sure. – Nikhil Vartak May 26 '16 at 03:28
  • 2
    @Think yes, and people have been doing it wrong for ages. It makes no sense to validate email addresses using regular expressions. You're going to frustrate and exclude legitimate users. There are many discussions on this subject already, see for example [this one](http://stackoverflow.com/questions/46155/). – CodeCaster May 26 '16 at 08:38
  • 1
    @Think see also [How to Find or Validate an Email Address](http://www.regular-expressions.info/email.html): _"Don't go overboard in trying to eliminate invalid email addresses with your regular expression. The reason is that you don't really know whether an address is valid until you try to send an email to it. [...] If you really need to be sure an email address is valid, you'll need to send an email to it"_ – CodeCaster May 26 '16 at 08:41
  • @Think2ceCode1ce No, I can't prove it, but CodeCasters' link comes pretty close. I once saw a regex that claimed to be RFC compliant but that was over 4k characters long. Since the regex doesn't know which version of the RFCs is accepted by which server, you are just trading false negatives with false positives. – Manfred Radlwimmer May 26 '16 at 09:10
  • 1
    @CodeCaster Great. Learned something today. – Nikhil Vartak May 26 '16 at 10:42
  • @Think great, happy to help. The point is, there may be a regex that works today, but it can be obsoleted tomorrow. The first email regexes that circulate over the web and get reposted on a daily basis still assume 2-3 letter TLD's, for example. Then came `.info` and `.museum`, and they were updated to 2-6. Then came [internationalized domain names](https://en.wikipedia.org/wiki/Internationalized_domain_name), which again broke a _lot_ of existing regexes. Then came Unicode local-parts. And so on. Simply check for an `@`, try to send a mail, and you know (kind of) whether it's valid. – CodeCaster May 26 '16 at 10:46

2 Answers2

4

Those characters are not invalid. Unusual, but not invalid. The question you linked even contains an explanation why you shouldn't care.

Full use of electronic mail throughout the world requires that (subject to other constraints) people be able to use close variations on their own names (written correctly in their own languages and scripts) as mailbox names in email addresses.

- RFC 6530, 2012

Community
  • 1
  • 1
Manfred Radlwimmer
  • 13,257
  • 13
  • 53
  • 62
0

The characters you mentioned (ø, å or åge@gmail.com) are not invalid. Consider an example: When someone uses foreign language as their email id (French,German,etc.), then some unicode characters are possible. Yet EmailAddressAttribute blocks some of the unusual characters.

  • You can use international characters above U+007F, encoded as UTF-8.

  • space and "(),:;<>@[] characters are allowed with restrictions (they are only allowed inside a quoted string, a backslash or double-quote must be preceded by a backslash)

  • special characters !#$%&'*+-/=?^_`{|}~

Regex to validate this: Link

^(([^<>()[].,;:\s@\"]+(.[^<>()[].,;:\s@\"]+)*)|(\".+\"))@(([^<>()[].,;:\s@\"]+.)+[^<>()[].,;:\s@\"]{2,})

Bharath theorare
  • 524
  • 7
  • 27
  • 3
    We don't need another non-working email validation regex on the web. – CodeCaster May 25 '16 at 11:57
  • 2
    Check the link for the above working regex. Hope you will know it works. – Bharath theorare May 25 '16 at 12:51
  • Well see for example [Email Address test cases](https://blogs.msdn.microsoft.com/testing123/2009/02/06/email-address-test-cases/) for a few cases for which this regex will give false positives and false negatives (disregard the Unicode one though). That being said, this is not and will not become a "dump your favorite email validation regex" question. [We already have a couple of those that address all issues with regex email validation](http://stackoverflow.com/questions/46155/), and that is not what this question is about. It also helps to link to where you found this regex. – CodeCaster May 25 '16 at 12:55
  • 2
    It briefs the particular problem of the question and I gave a regex which would pass the required test case mentioned in question. I haven't mentioned this is the "perfect regex" that would fancy all email validations. It also shows how obsolete that you pointed (wrongly though) the link which you would think that I "copycasted". – Bharath theorare May 25 '16 at 13:25
  • I'm really trying to be constructive, but you seem to miss the point. The OP's question is: _"I want to validate [all possible] email addresses. Yet my validation allows Unicode characters. Why is this?"_. You then post a regular expression that might accept Unicode characters in email addresses, but which rejects other addresses which are also valid. How is that helpful if you can't use the regex? How does it answer the OP's question? – CodeCaster May 25 '16 at 14:16
  • 1
    Keep trying to be constructive which you really not upto. If you say so(with the above comment), what do you mean by [all possible] email addresses? I'm expecting you to be sure of telling abstract answers and not of your obsolete actions trying to be keen to the question though by deleting your previous comments. – Bharath theorare May 26 '16 at 04:50