0

Hi I am working on a bulk email sending feature. Below is my loop that validates email and sends them to each recipient:

foreach (var userID in recipientUserIds)
{
    var userInfo = //getting from database using userID.

    try
    {
        to = new MailAddress(userInfo.Email1, userInfo.FirstName + " " + userInfo.LastName);
    }
    catch (System.FormatException fe)
    {
        continue;
    }

    using (MailMessage message = new MailMessage(from, to))
    {
        //populate message and send email.
    }
}

Since the recipientUserIds is always more than 2000, using try-catch seems to be very expensive for each user in this case just to validate the email address format. I was wondering to use regex, but not sure if that will help with performance.

So my question is if there is any better or performance optimized way to do the same validation.

Rufus L
  • 36,127
  • 5
  • 30
  • 43
Naphstor
  • 2,356
  • 7
  • 35
  • 53
  • You may find some of the answers from this question useful: https://stackoverflow.com/questions/1365407/c-sharp-code-to-validate-email-address – Daniel Crha Oct 17 '19 at 22:01
  • Yes Daniel, I looked at the post but that is not quiet clear about the performance issues if any. – Naphstor Oct 17 '19 at 22:02
  • 1
    Are you sure there's a noticeable perf issue here? If so, you could try to validate the email address ahead of time, but that's very tricky business. See [this page](https://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx/) for a RegEx that may work for you. Also, the duplicate question has other answers that may have better performance. If you believe this is not a duplicate, please clarify which perf metrics you're measuring and what your perf goals are. – Rufus L Oct 17 '19 at 22:11
  • I'm not sure using regex to validate the emails will make a noticeable impact on performance for only 2000 values. However, using exception handling for validation is very frowned upon, so doing proper validation is the way to go regardless of performance. – Racil Hilan Oct 17 '19 at 22:12
  • 1
    @RacilHilan Sure additional validation is good to have, but at some point you're going to need the `try/catch` around the call to create a new `MailAddress`, won't you? – Rufus L Oct 17 '19 at 22:14
  • To clarify, if you decide to use `EmailAddressAttribute` as mentioned in the duplicate question, it will be a lot more efficient than the current way you do it, since it doesn't have the overhead of throwing exceptions. However, there's about 50 million different ways to validate an e-mail on the internet and all of them are wrong, so my advice would be to either use the attribute (which just checks if there is only one @ sign and it's neither first nor last), or find some other reasonably simple regex and validate manually. – Daniel Crha Oct 17 '19 at 22:15
  • I tend to agree with @DanielCrha. The supported formats for the `MailAddress` class are documented under the **Remarks** section [here](https://learn.microsoft.com/en-us/dotnet/api/system.net.mail.mailaddress?view=netframework-4.8). Write a regex that satisfies those patterns, and you should be ok. – Rufus L Oct 17 '19 at 22:17
  • @RufusL That's not the point of my comment. The point is use exception handling for exceptions and nothing else. And validate values using validation not exceptions. There are rare cases where you have to break the rules, but email address validation is definitely not one of them. Validation is not just good, it's the right way to go. – Racil Hilan Oct 17 '19 at 22:23
  • Also, there are three exceptions that may be thrown from this constructor: `ArgumentNullException` (when the address is null), `ArgumentException` (when the address is an empty string), and `FormatException` (when address is not in a recognized format or contains non-ASCII characters). You can avoid the first two with minimal validation (`if (string.IsNullOrWhitespace(userInfo.Email1)) continue;`) – Rufus L Oct 17 '19 at 22:25
  • @RacilHilan I understand and agree with the spirit of that rule, but I disagree that this is *"definitely not one of [the rare cases where you have to break the rules]"*. Simply because validating email addresses is actually a fairly difficult task. For example, comments, bracketed domain names, and embedded quotes are supported by `MailAddress`, so this is considered a valid email address: `(comment)"\"John \\\"Jr\\\" Doe\""(comment)<(comment)user(comment)@(comment)[my domain](comment)>(comment)` – Rufus L Oct 17 '19 at 22:29
  • @RufusL It's ironic that you *"disagree"* with my comment, yet your other comments to the OP are exactly what my comment was about, use proper validation. I didn't say don't use exception handling for it at all, but if you do as much validation as you reasonably can (like your comments), then what's left is really exceptions and exception handling is correct for it then. Now there is one point to make. If validation is slower than exceptions (I really doubt it here, but just hypothetically), and performance is crucial, then breaking the rule by using only exception is one of the rare cases. – Racil Hilan Oct 17 '19 at 22:36
  • @RacilHilan ahaha, good point. I guess we agree more than I thought! :) – Rufus L Oct 17 '19 at 22:38

1 Answers1

1

Validating email addresses is a complicated task, and writing code to do all the validation up front would be pretty tricky. If you check the Remarks section of the MailAddress class documentation, you'll see that there are lots of strings that are considered a valid email address (including comments, bracketed domain names, and embedded quotes).

And since the source code is available, check out the ParseAddress method here, and you'll get an idea of the code you'd have to write to validate an email address yourself. It's a shame there's no public TryParse method we could use to avoid the exception being thrown.

So it's probably best to just do some simple validation first - ensure that it contains the minimum requirements for an email address (which literally appears to be user@domain, where domain does not have to contain a '.' character), and then let the exception handling take care of the rest:

foreach (var userID in recipientUserIds)
{
    var userInfo = GetUserInfo(userID);

    // Basic validation on length
    var email = userInfo?.Email1?.Trim();
    if (string.IsNullOrEmpty(email) || email.Length < 3) continue;

    // Basic validation on '@' character position
    var atIndex = email.IndexOf('@');
    if (atIndex < 1 || atIndex == email.Length - 1) continue;

    // Let try/catch handle the rest, because email addresses are complicated
    MailAddress to;
    try
    {
        to = new MailAddress(email, $"{userInfo.FirstName} {userInfo.LastName}");
    }
    catch (FormatException)
    {
        continue;
    }

    using (MailMessage message = new MailMessage(from, to))
    {
        // populate message and send email here
    }
}
Rufus L
  • 36,127
  • 5
  • 30
  • 43