1

I have these two regex, both for validating email. The first one is from ASP.NET email regex validator and the second one I found on SO. My question is what is the the difference between them and which one is better?

/^\w+([-+.\']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*/

/^([a-zA-Z0-9_\.\-\+])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+$/

Both regex allow the ukkkk

<script type="text/javascript">

var regex = /^\w+([-+.\']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*/;
var regex2 = /^([a-zA-Z0-9_\.\-\+])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+$/;

alert(regex.test('nhassyk@yahoo.co.ukkkk'));
alert(regex2.test('nhassyk@yahoo.co.ukkkk'));

</script>
Registered User
  • 3,669
  • 11
  • 41
  • 65
  • There's no objective answer on which one is "better". It completely depends on your requirements for email formatting. If both of them satisfy what you feel should and should not be an email, with an appropriate efficiency, then it's up to you. – David B Jun 20 '12 at 20:59
  • Only one allow `1234@hello.com` or `123@123.123`, are they valid emails? – J-16 SDiZ Jun 20 '12 at 21:03

2 Answers2

1

"Better" is a pretty subjective term - it depends on what exactly you're looking for.

The first regex allows a single quote character in the first part of the email address and uses \w+ which includes underscores, so it's more inclusive if you're using that as one of your criteria for "better".

The second doesn't use underscores everywhere the first does, but that's trivial since there aren't underscores in top level domains. It is also slower, as the {2,4}+$/ will break up the end of the address into groups of 2, 3, or 4 until it gets it "right". It should be changed to {2,})$ to run faster.

Ultimately, it will come down to what you want to be considered a valid email address.

larissa
  • 493
  • 4
  • 16
1

Actually, both of them aren't perfect.

Let's make them better: since [a-zA-Z0-9_] and \w are the same, we use second variant.
There are fist-level domains such as travel, so we will increase maximum length of it, but of course we won't use * as in first regex.
And after all, there is a mistake in second regex - there shouldn't be last plus. We need to have first-level domain, but only one.

Here is a result: ^([\w\.\-\+])+\@(([\w\-])+\.)+([a-zA-Z0-9]{2,6})$

Anyway, nhassyk@yahoo.co.ukkkk is valid e-mail, even if it doesn't exist.

And does somebody know: is there first-level domains longer then 6 symbols?

UPDATE: Here is a great article about validating e-mails according to RFC-standard. Final variant is very impressive.

Smileek
  • 2,702
  • 23
  • 26
  • 3
    Don't be so hasty in [limiting the TLDs to 6 characters!](http://newgtlds.icann.org/en/program-status/application-results/strings-1200utc-13jun12-en) – Ray Toal Jun 20 '12 at 21:23
  • Wow! Thank you very much! `ALLFINANZBERATUNG` and some others are amazing. :) I think one can go crazy trying to meet _all_ the requirements of e-mail validation. In this case we didn't think about maximum length of 256 symbols, didn't check it for Unicode-characters and so on... But does somebody need it in real application? – Smileek Jun 20 '12 at 21:35
  • Anyway, we can use `[a-zA-Z0-9]{2,}` (or maybe `[a-zA-Z]{2,}`) to make a one more little step to RFC 2822. – Smileek Jun 20 '12 at 21:37
  • [254 characters, technically](http://stackoverflow.com/questions/386294/maximum-length-of-a-valid-email-address) :) – Ray Toal Jun 20 '12 at 21:48