You have stumbled upon catastrophic backtracking !
To fix it, you can use:
^\w+([.-]\w+)*@\w+([.-]\w+)*(\.\w{2,4})+$
Before declaring a failure to match, a regex engine has to check all possible ways of applying the regex.
With \w+([.-]?\w+)*
, each digit could be matched both by \w+
or by ([.-]?\w+)*
, and on failure ([.-]?\w+)*
can behave the same as (\w+)*
. This was creating a lot of states to come back to.
Making the [.-]
mandatory removes this ambiguity.
On (catastrophic) backtracking
The way a regex engine works is that it moves through the regex, trying to match each part of it to the string. Each time it has a choice, it saves the state so that it can backtrack to it if this was the wrong path.
To illustrate, see how .*b
matches abc
.
The .*
quantifier is greedy so it will match as much as possible: it starts by matching nothing, then adds a
, then adds b
, then adds c
. Each time the engine remembers it chose to add another character. Then the engine tries to match b
, but we're at the end of the string so that fails.
No biggie, we come back to the last choice we made: now .*
only matches ab
. The final b
still doesn't match the string's ending c
.
We come even further back: .*
matches a
, b
can finally match. Overall match found: ab
Before knowing the regex cannot match the input, the engine has to try all these alternatives. If you allow a character to be matched by multiple parts of the regex, that number increases.
In your case it was increasing exponentially with the length of the string, to the point that it takes forever to return anything: that's catastrophic backtracking.
Great illustration by Friedl on the subject: http://www.foo.be/docs/tpj/issues/vol1_2/tpj0102-0006.html