0

I have this C# code:

void Main()
{
    // method 1 - using MailAddress
    var email = "fooªbar@cander.com";
    Console.WriteLine(IsValidEmail(email));
    
    // method 2 - using EmailAddressAttribute
    var validator = new System.ComponentModel.DataAnnotations.EmailAddressAttribute();
    Console.WriteLine(validator.IsValid(email));
}

bool IsValidEmail(string email)
{
    try
    {
        var addr = new System.Net.Mail.MailAddress(email);
        return addr.Address == email;
    }
    catch
    {
        return false;
    }
}

That validates the fooªbar@cander.com email address. And... It validates it althougt it has the "ª" symbol. Why? According to: What characters are allowed in an email address? it shoudn't be valid

gunr2171
  • 16,104
  • 25
  • 61
  • 88
rasputino
  • 691
  • 1
  • 8
  • 24
  • I don't know for certain, but perhaps it's an issue with the regex string: https://stackoverflow.com/questions/201323/how-can-i-validate-an-email-address-using-a-regular-expression – Dortimer Dec 02 '21 at 19:58
  • 1
    Yeah, that uses a much more complicated regex. https://github.com/microsoft/referencesource/blob/master/System.ComponentModel.DataAnnotations/DataAnnotations/EmailAddressAttribute.cs#L54 – gunr2171 Dec 02 '21 at 20:01
  • 1
    Specifically, `ª` is a member of ["letter, other"](https://www.compart.com/en/unicode/category/Lo). – Tech Inquisitor Dec 02 '21 at 20:04
  • Well, the `MailAddress()` constructor should have thrown a `FormatException`. Is that a bug in .NET ? – Cid Dec 02 '21 at 20:05
  • You can't validate email addresses with Regex anyway. You must send an email and use validation via link or similar. – Thomas Weller Dec 02 '21 at 20:09
  • [Please do NOT use regex to “validate” email addresses.](https://michaellong.medium.com/please-do-not-use-regex-to-validate-email-addresses-e90f14898c18) – Tech Inquisitor Dec 02 '21 at 20:09
  • 2
    I've made a radical adjustment to the question text in the hope that I think you're talking about why specifically the built-in classes perform the way they do, not your custom regex. – gunr2171 Dec 02 '21 at 20:11
  • 3
    It's possible that it's a [valid email](https://stackoverflow.com/questions/760150/can-an-email-address-contain-international-non-english-characters) – Cid Dec 02 '21 at 20:12
  • The original question had a regex that includes the \w. The \w includes the ª symbol according https://stackoverflow.com/questions/2998519/net-regex-what-is-the-word-character-w And the ª symbol is allowed now according the RFC6532. So I was wrong suppossing that the ª symbol should be an invalid character. – rasputino Dec 02 '21 at 20:27

1 Answers1

2

It validates it althougt it has the "ª" symbol. Why?

Because your Regex allows "one or more \word characters" before the @, and ª is a word character:

enter image description here

RegexStorm uses the .net engine: you can see that the \w pattern (a single word character) has successfully matched an ª (one match)

According to: What characters are allowed in an email address? it shoudn't be valid

Alas, the regular expression you have used does not accurately implement the specification given in the linked question

When it comes to validating email addresses, genuinely I don't think you should try and control it to a very fine degree - it's a headache to form and maintain a complex Regex that considers every variation and it doesn't really bring much benefit, it just generates a pain point for users whose valid emails don't validate because of a bug in your Regex.

When we test for email validity, we basically only check that it contains an @.. what's the worst that can happen if a user types it in wrong?

(apologies if that picture appears huge; it looks reasonable on a cellphone but I recall that iPhone screenshots sometimes end up looking a bit oversized on web)

Caius Jard
  • 72,509
  • 5
  • 49
  • 80