5

I got a good email vaidation regex from: Email regular expression

    public static void Main(string[] args)
    {
        string value = @"cvcvcvcvvcvvcvcvcvcvcvvcvcvcvcvcvvccvcvcvc";
        var regex = new Regex(
            @"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$",
            RegexOptions.Compiled);
        var x = regex.Match(value); // Hangs here !?!
        return;
    }

It works in most cases, but the code above hangs, burning 100% CPU... I've tested in a W8 metro App. and on a standard .Net 4.5 app.

Can anyone tell me why this happens, and if there is a good email validation REGEX that doesn't hang, or if there is a way to fix this one?

Many thanks, Jon

Community
  • 1
  • 1
Jon Rea
  • 9,337
  • 4
  • 32
  • 35
  • 1
    [This](http://www.regular-expressions.info/catastrophic.html) may help you find out why it hangs. [This](http://www.regular-expressions.info/email.html) may help you find out how to match email addresses properly with regex. – Martin Ender Oct 26 '12 at 13:23
  • You should read this in order to create a proper email matching regex http://www.regular-expressions.info/email.html – CaffGeek Oct 26 '12 at 13:34

3 Answers3

15

The explanation why it hangs: Catastrophic backtracking.

Let's simplify the crucial part of the regex:

(\w*[0-9a-zA-Z])*@

You have

  • an optional part \w* that can match the same characters as the following part [0-9a-zA-Z], so the two combined translate, in essence, to \w+
  • nested quantifiers: (\w+)*

This means that, given s = "cvcvcvcvvcvvcvcvcvcvcvvcvcvcvcvcvvccvcvcvc", this part of the regex needs to check all possible permutations of s (which number at 2**(len(s)-1)) before deciding on a non-match when the following @ is not found.

Since you cannot validate an e-mail address with any regex (there are far too many corner cases in the spec), it's usually best to

  • do a minimal regex check (^.*@.*$)
  • use a parser to check validity (like @Fake.It.Til.U.Make.It suggested)
  • try and send e-mail to it - even a seemingly valid address may be bogus, so you'd have to do this anyway.

Just for completeness, you can avoid the backtracking issues with the help of atomic groups:

var regex = new Regex(
    @"^([0-9a-zA-Z](?>[-.\w]*[0-9a-zA-Z])*@(?>[0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$",
    RegexOptions.Compiled);
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Hi, Thanks for the detailed answer :-) I'll go with a validation like "do a minimal regex check (^.*@.*$)" - being as we're really just trying to help the user avoid typos like typing e.g. '..'. If they enter the wrong address, it's not the end of the world as we have other email recovery mechanisms. Cheers, Jon – Jon Rea Oct 26 '12 at 13:51
4

Never ever use regex to validate an email..

You can use MailAddress class to validate it

try 
{
    address = new MailAddress(address).Address;
   //address is valid
} 
catch(FormatException)
{
    //address is invalid
}
Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • Hi, I do like that approach, but unfortunately 'System.Net.Mail.MailAddress' isn't available in 'Win8 C#' / WinRT. Do you know an alternative which is available? It also doesn't answer *why* the above regex is hanging. Thanks, Jon – Jon Rea Oct 26 '12 at 13:28
  • @JonRea in you regex u r using `-` in `[]` which need to be escaped like this: `\-` – Anirudha Oct 26 '12 at 13:30
  • 1
    @Fake.It.Til.U.Make.It: No, the `-` only needs to be escaped in a character class if it's not the first or last character. – Tim Pietzcker Oct 26 '12 at 13:42
1

guess it's because of [-.\w] in regex, try to use this:

^[a-zA-Z0-9_-]+(?:\.[a-zA-Z0-9_-]+)*@(?:(\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$

Also, in .net 4.5 EmailAttribute should be available, not sure though

Johan
  • 74,508
  • 24
  • 191
  • 319
Sergio
  • 6,900
  • 5
  • 31
  • 55
  • `regex` is not good for email validation..an actual regex for email id would be far,far,far bigger than this... – Anirudha Oct 26 '12 at 13:27
  • It only depends on how you see correct email. MailAddress class may use regex for email validation too - reflect it :). Also email may be country specific, so regex is a way to go for me – Sergio Oct 26 '12 at 13:30