0

I am currently building a system using Asp.net, c#, Mvc2 which uses the following regex:

^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$

This is an e-mail regex that validates a 'valid' e-mail address format. My code is as follows:

if (!Regex.IsMatch(model.Email, @"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$"))
                ModelState.AddModelError("Email", "The field Email is invalid.");

The Regex works fine for validating e-mails however if a particularly long string is passed to the regex and it is invalid it causes the system to keep on 'working' without ever resolving the page. For instance, this is the data that I tried to pass:

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

The above string causes the system to essentially lock up. I would like to know why and if I can use a regex that accomplishes the same thing in maybe a simpler manner. My target is that an incorrectly formed e-mail address like for instance the following isn't passed:

host.@.host..com
William Calleja
  • 4,055
  • 11
  • 41
  • 51
  • @Liam, that's bull. It's possible to write inefficient regex, but your wide brushstroke generalism doesn't help here. – spender Oct 18 '12 at 12:32
  • 6
    Take a read of this: http://www.codinghorror.com/blog/2006/01/regex-performance.html I'd suggest attempting to construct a `System.Net.Mail.MailAddress` and catching the error to detect a bad address. – spender Oct 18 '12 at 12:39

1 Answers1

6

You have nested repetition operators sharing the same characters, which is liable to cause catastrophic backtracking.

For example: ([-.\w]*[0-9a-zA-Z])*

This says: match 0 or more of -._0-9a-zA-Z followed by a single 0-9a-zA-Z, one or more times.

i falls in both of these classes.

Thus, when run on iiiiiiii... the regex is matching every possible permuation of (several "i"s followed by one "i") several times (which is a lot of permutations).

In general, validating email addresses with a regular expression is hard.

Community
  • 1
  • 1
Rawling
  • 49,248
  • 7
  • 89
  • 127
  • Is there a way to modify the Regex I am using to fulfil the same function and removing the nested operators? – William Calleja Oct 18 '12 at 12:39
  • I'd suggest removing the `\w` from the first class and applying the `*` to the second class instead of the first class - see if that still matches your "good" cases. (But in general I'd suggest not attempting this.) – Rawling Oct 18 '12 at 12:41
  • Ok, any particular reason why this ought not to be attempted? – William Calleja Oct 18 '12 at 12:49
  • 2
    The question I've linked (or rather, its answer) has a very good discussion on this, but basically, the official email specification is _complicated_ and pretty tough to properly validate with a mere regular expression. – Rawling Oct 18 '12 at 12:52