0

I'm following these specifications from wikipedia.

[^\.]([a-zA-Z\d\!\#\$%&\'\*\+\-\/\=\?\^_`\{|\}\~]|[^\.][\.]{1})+[^\.]@[a-zA-Z\d\-\_]+(\.[a-z]{2,5}){1,2}

How can it be improved/shortened?

PS:

I know there are multitudes of email validators already made, this is strictly for my own learning in regards to regex. Thanks.

Community
  • 1
  • 1
Korvin Szanto
  • 4,531
  • 4
  • 19
  • 49
  • You chose one of the more *complex* regex examples to learn on. :) – Jason McCreary Sep 27 '11 at 17:58
  • I agree, yet I'm not a beginner, I'm semi-seasoned =p – Korvin Szanto Sep 27 '11 at 18:00
  • 3
    Email addresses should never be validated with a regex. This is not a good project to learn regex on, as you can only fail. And there are so many errors in your regex, I suggest you first read a basic tutorial like http://www.regular-expressions.info – Tim Pietzcker Sep 27 '11 at 18:00
  • 2
    If this is solely about this particular pattern, then it's OT here, IMO (too localized). You could try [Codereview-SE](http://codereview.stackexchange.com/) in that case. If it's a general question about e-mail and regex-es, see: http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses – Bart Kiers Sep 27 '11 at 18:01
  • What is a better way to validate? Why is regex a bad way? – Korvin Szanto Sep 27 '11 at 18:01
  • The problem with email address validation is that strictly speaking, an email address can be very bizarre indeed (see section 3.4 of [RFC-2822](http://www.ietf.org/rfc/rfc2822.txt)). [This article](http://www.regular-expressions.info/email.html) has a very good discussion of the trade-offs you should consider when writing an email regex. – daiscog Sep 27 '11 at 18:01
  • 1
    The only validation you should be doing is to check whether there's an `@` in it. Other than that, you'll have to try and send e-mail to it. If that succeeds, you still don't know if the address actually leads to an active mailbox, so you'll have to wait for a response before you can finally be sure that that mail has been received. – Tim Pietzcker Sep 27 '11 at 18:03

2 Answers2

1

Domain names cannot contain underscores, so you should remove this from the part after the @.

What about sub-domains? I don't think the given regex will match someone@subdomain.example.com

Personally, I've always used /^[a-zA-Z0-9._%+-]+@([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,6}$/ which does not match the entire RFC-2822 specification, but does the job for >99.9% of real-world email addresses.

daiscog
  • 11,441
  • 6
  • 50
  • 62
  • 2
    But that other 0.1% still exists. Which means that you have to write special-purpose code to handle it. Or treat those email address as invalid. I'm sure the potential customer will be happy to hear, "I'm sorry, but it was inconvenient for me to implement the entire RFC2822 specification." – Jim Mischel Sep 27 '11 at 18:14
  • @JimMischel : Yes you're absolutely right, of course, but what I use depends on context. Saying I've "always used" the above regex is slightly untrue as recently I've had to accommodate a more technically-oriented audience and so chose to use a larger regex that covered the whole spec. But for the average Joe Public, whose email addresses tend to be somethingwitty@yahoo.com, the above is fine. Again, see [this article](http://www.regular-expressions.info/email.html). – daiscog Sep 27 '11 at 18:35
  • Thanks for that, I didn't think about subdomains. The way I fixed this is to add `(([a-zA-Z\d\-]+\.)+)?` after my @ to check for subdomains: `email@this.domain.is.valid.com` is valid while `email@this.domain.is.not.validemail` is not. – Korvin Szanto Sep 27 '11 at 19:08
  • Your regex also limits the TLD to 5 characters. What about .museum or .travel domains? In extreme cases, you could have an entirely internal based email system with addresses that do not end with global TLDs, but use internal host names or even IP addresses, instead (admittedly, my regex doesn't allow for this scenario, either, but as I mentioned above, you can use the full RFC-2822 regex for that). – daiscog Sep 27 '11 at 19:26
1

If you want to learn about validating email addresses with regular expressions and some of the trade-offs, read this article http://www.regular-expressions.info/email.html

Another good source are widely used open source libraries or applications that contain functions for validating email.

Your regex indeed doesn't match emails with subdomains, which you can achieve by adding the dot to the 1st character class after the @ sign

[^\.]([a-zA-Z\d\!\#\$%&\'\*\+\-\/\=\?\^_`\{|\}\~]|[^\.][\.]{1})+[^\.]@[a-zA-Z\d\-\_\.]+(\.[a-z]{2,5}){1,2}
ramiro
  • 878
  • 9
  • 20
  • You're right! Thanks, but your implementation is incorrect as it breaks my tld identification EX: `email@sub.domain.c` is valid with your suggestion. The way to prevent this is to implement like this: `^[^\.]([a-zA-Z\d\!\#\$%&\'\*\+\-\/\=\?\^_\`\{|\}\~]|[^\.][\.]{1})+[^\.]@(([a-zA-Z\d\-]+\.)+)?[a-zA-Z\d\-]+(\.[a-z]{2,6}){1,2}$`. – Korvin Szanto Sep 27 '11 at 19:08
  • This regex will also allow a dot immediately after the @, which is invalid. – daiscog Sep 27 '11 at 19:28