14

I have written the regex below for a really simple email validation. I plan to send a confirmation link.

/.*@[a-z0-9.-]*/i

I would, however, like to enhance it from the current state because a string like this does not yield the desired result:

test ,my.name+test@gmail-something.co.uk, test

The "test ," portion is undesirably included in the match. I experimented with word boundaries unsuccessfully.

  1. How should I modify?
  2. Even though I've kept this simple, are there any valid email formats it would exclude?

THANKS!

ryonlife
  • 6,563
  • 14
  • 51
  • 64
  • This has been here so often... Have you looked at the questions you were shown after entering your title? – Tomalak Feb 03 '09 at 17:40
  • http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses/201378#201378 – Brad Mace Jul 09 '11 at 04:29

7 Answers7

20

It's a lot more complicated !!! See Mail::RFC822::Address and be scared...very scared.

an0nym0usc0ward
  • 1,207
  • 8
  • 8
  • The first time i saw this Regex it scared me a lot, i showed it to a friend and he didn't believe me what it was THE EMAIL REGEX at first then he was also horrified. Good memories. – Random Developer Feb 03 '09 at 17:54
  • The Mail::RFC822::Address accepts much more than what is commonly known as an e-mail address. See the comment at the bottom that says: "This regular expression will only validate addresses that have had any comments stripped and replaced with whitespace". So it accepts whitespace. – dolmen May 26 '11 at 09:17
  • well, I was ready to be horrified. But this is beyond Lovecraftian... – Arani Apr 11 '23 at 06:29
17

Don't use regular expressions to validate e-mail addresses

Instead, from mail.python.org/pipermail/python-list1 written by Ben Finney.

The best advice I've seen when people ask "How do I validate whether an email address is valid?" was "Try sending mail to it".

It's both Pythonic, and truly the best way. If you actually want to confirm, don't try to validate it statically; use the email address, and check the result. Send an email to that address, and don't use it any further unless you get a reply saying "yes, this is the right address to use" from the recipient.

The sending system's mail transport agent, not regular expressions, determines which part is the domain to send the mail to.

The domain name system, not regular expressions, determines what domains are valid, and what host should receive mail for that domain.

Most especially, the receiving mail system, not regular expressions, determines what local-parts are valid.

1This is original link before it went dead

Petter Friberg
  • 21,252
  • 9
  • 60
  • 109
alxp
  • 6,153
  • 1
  • 22
  • 19
  • +1 for the link I would have posted (wish I could give +10!) – David Z Feb 03 '09 at 17:41
  • I think this is a good candidate for a Jeff and Joel podcast rant about what isn't an awesome answer. Sometimes you may just want a heuristic to do something and 98 percent of the time, people with dumb email addresses can go spit. – Peter Turner Feb 03 '09 at 18:11
  • 3
    "people with weird e-mail addresses can go spit" You're fired. – alxp Feb 03 '09 at 18:35
  • Seriously, I'm not sure if it was a good idea, but I made a box for a response form that someone could enter a phone number or an email address in, if they entered something I thought was an email address (using regex) it'd put that email address in the reply to field in the header. Semi-Handy! – Peter Turner Feb 03 '09 at 18:45
  • 1
    Sometimes I tell websites with stupid regex checks to go spit; by never using them again. I need my gmail + syntax! – Chase Seibert Feb 03 '09 at 19:34
  • 15
    This link seems to be broken now. – Aron Rotteveel Feb 10 '11 at 09:47
  • 2
    When we say validate an email address we are talking about two things. 1. Is validating that it's a real email address (which you can validate by sending a confirmation link to the address. 2. Is the validation that most people are talking about. That is to use a regular expression to *assist* the user in entering an address. x@x.x is all you need. We're not here to validate whether imanidiot@yourmomshouse.lol is a real email address. Everyone against regex validation for email either don't understand the issue fully or are just complete trolls. – The Muffin Man Sep 10 '13 at 20:38
12

Almost nothing you use that is short enought to make sense looking at it will TRULY validate an email address. With that being said, here is what I typically use:

^\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$

It's actually the built in regex for ASP.NET's regular expression validator for email addresses.

NOTE: many of the regexes given in this thread MAY have worked in the 90's, but TLD's are allowed to be less than 2 characters and more than 4 characters in today's web environment. For example, info@about.museum IS a valid email address because .museum is one of those new, long TLDs.

Rick
  • 1,863
  • 2
  • 19
  • 46
  • '_' matches \w. But '_' is not allowed in domain names. Also we have now internationalized domain names (in arabic for example). – dolmen May 26 '11 at 09:20
  • This does not check for spaces as "joe blogs@email.com" returns as a valid email. – Weggo Jan 07 '15 at 12:57
  • 1
    @Weggo This does not allow spaces. If yours does, you might be missing the caret (^) at the start – Rick Jan 09 '15 at 21:16
  • 1
    @Rick - your correct, I didn't have (^). I've +1 your answer. – Weggo Jan 12 '15 at 12:26
4

I found that instead of matching the whole email-address against a regular expression, it is much more practical to just split the string at the @ and:

  • First check for existing MX or A records of the domain part via a DNS-library.
  • Then check the localpart (the part on the left hand side of the @) against a simpler regex.

The reason to do the DNS checking is that unreachable email-addresses albeit RFC-compliant are worth nothing. The reason for additionally checking the A-record is that they are used to determine where to deliver mail to when no MX record is found. (see RFC2821, 3.6)

Further tips:

  • Use a robust DNS resolver library, do not roll your own. Test it against large companies. These sometimes have a huge number of mailservers, which can lead to problems. I've seen a buggy library crap out on bmw.com. Just saying. :)
pi.
  • 21,112
  • 8
  • 38
  • 59
1

Instead of . try matching every character except \s (whitespace):

/[^\s]*@[a-z0-9.-]*/i
Martin Brown
  • 24,692
  • 14
  • 77
  • 122
  • To match everything except whitespace, shouldn't we match for (capital S) `\S`? Unless it doesn't work with all regex engines.. – Fábio Santos Nov 27 '12 at 22:14
  • The ^ at the front of the square brackets means characters not in this list. As such it inverts the meaning f \s. I guess you could use /\S*@[a-z0-9.-]*/i instead. – Martin Brown Nov 28 '12 at 16:51
-1

A smaller two step regex provides good results

/** check to see if email address is in a valid format. * Leading character of mailbox must be alpha
* remaining characters alphanumeric plus -_ and dot
* domain base must be at least 2 characters
* domain extension must be at least 2, not more than 4 alpha
* Subdomains are permitted. * @version 050208 added apostrophe as valid char * @version 04/25/07 single letter email address and single
* letter domain names are permitted. */ public static boolean isValidEmailAddress(String address){ String sRegExp;

    // 050208 using the literal that was actually in place
    // 050719 tweaked 
    // 050907 tweaked, for spaces next to @ sign, two letter email left of @ ok
    // 042507 changed to allow single letter email addresses and single letter domain names
    // 080612 added trap and unit test for two adjacent @signs
    sRegExp =   "[a-z0-9#$%&]"          // don't lead with dot
        +   "[a-z0-9#$%&'\\.\\-_]*"     // more stuff dots OK
        +   "@[^\\.\\s@]"               // no dots or space or another @ sign next to @ sign
        +   "[a-z0-9_\\.\\-_]*"         // may or may  not have more character
        +   "\\.[a-z]{2,4}";            // ending with top level domain: com,. biz, .de, etc.

    boolean bTestOne =  java.util.regex.Pattern.compile( sRegExp,
            java.util.regex.Pattern.CASE_INSENSITIVE).matcher(address).matches();

    // should this work ?
    boolean bTwoDots =  java.util.regex.Pattern.compile("\\.\\.",  // no adjacent dots
                    java.util.regex.Pattern.CASE_INSENSITIVE).matcher(address).find();

    boolean bDotBefore = java.util.regex.Pattern.compile("[\\.\\s]@", //no dots or spaces before @
                         java.util.regex.Pattern.CASE_INSENSITIVE).matcher(address).find();

    return bTestOne && !bTwoDots && !bDotBefore;
}   // end IsValidEmail
-1

this comes from Regex Buddy (definitely a need to buy prog!)

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,6}\b
Keng
  • 52,011
  • 32
  • 81
  • 111