7

Here's a regex for validating emails - \S+@\S+\.\S+, I didn't write it. I'm new to Regular Expressions and do not understand them all that well.

I have a couple of questions:

  1. What's wrong with the above RegEx?
  2. What's a good RegEx for validating emails?
ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
Tesseract
  • 1,547
  • 4
  • 15
  • 16
  • What evidence do you have that something is wrong with the regex? Does it fail to match some patterns? which ones? It is extremely difficult for us to answer "Whats wrong?" with no indication that anything actually is wrong. – abelenky Jul 02 '09 at 20:43
  • 3
    Who wants to be the first to post the three-page email regex? – Michael Myers Jul 02 '09 at 20:43
  • 4
    As for what's wrong with it: Well, for one thing, it doesn't allow dots in the first part. That would disqualify two of my three email addresses. Also, it only allows one dot in the second part, which disqualifies domains such as ".co.uk". – Michael Myers Jul 02 '09 at 20:49
  • @mmyers Thank you very much, thats the kind of answer i was looking for. – Tesseract Jul 02 '09 at 20:50
  • 3
    It also doesn't allow for the "+" sign in usernames, which is legal. The portion after the "+" sign is ignored, but many people use it for filtering emails. For example, given the username "user@gmail.com", someone might use "user+amazon@gmail.com" as an Amazon email, allowing them to easily filter mail (or track if a particular service is giving their email address to another service). – mipadi Jul 02 '09 at 20:54
  • That's a good one, mipadi. I love using that and get seriously angry when a website doesn't let me. – Paolo Bergantino Jul 02 '09 at 20:55
  • 3
    What do you mean, doesn't allow for? \S matches *any* non-whitespace character. – Alan Moore Jul 03 '09 at 00:45
  • 2
    Indeed, Alan is right. `mipadi` and `mmyers`, you're both wrong: `\S` matches both the `.` (dot) and `+`. – Bart Kiers Nov 20 '09 at 22:44
  • possible duplicate of [What is the best regular expression for validating email addresses?](http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses) – Brad Mace Jul 09 '11 at 04:28

6 Answers6

18

"How do I validate an email with regex" is one of the more popular questions that come up when it comes to regular expressions and the only real good answer is "you don't". It has been discussed in this very website in many occasions. What you have to understand is that if you really wanted to follow the spec, your regex would look something like this. Obviously that is a monstrosity and is more an exercise in demonstrating how ridiculously difficult it is to adhere to what you are supposed to be able to accept. With that in mind, if you absolutely positively need to know that the email address is valid, the only real way to check for that is to actually send a message to the email address and check if it bounces or not. Otherwise, this regex will properly validate most cases, and in a lot of situations most cases is enough. In addition, that page will discuss the problems with trying to validate emails with regex.

Community
  • 1
  • 1
Paolo Bergantino
  • 480,997
  • 81
  • 517
  • 436
7

I'm only going to answer your first question, and from a technical regex point of view.

What is wrong with the regex \S+@\S+\.\S+, is that it has the potential to execute way too slowly. What happens if somebody enters an email string like the one below, and you need to validate it?

a@b.cdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789

Or even worse (yes, that are 100 @'s after the dot):

@.@@@@@@@@@@@@@@@@@@@@@@@@@ \ @@@@@@@@@@@@@@@@@@@@@@@@@ \ @@@@@@@@@@@@@@@@@@@@@@@@@ \ @@@@@@@@@@@@@@@@@@@@@@@@@

Slowliness happens. First the regex would greedily match as many characters as possible for the first \S+. So, it will initially match the whole string. Then we need the @ character, so it will backtrack until it finds one. At that point we've got another \S+, so, again it will consume everything until the end of the string. Then it needs to backtrack again until it finds a dot. Can you imagine how much backtracking occurs before the regex finally fails on the second email string?

To kill the backtracking, I suggest using possessive character classes in this case, which have the additional benefit of not allowing multiple @'s in one string.

[^@\s]++@[^@\s.]++\.[^@\s]++

I did a quick benchmark for the two regexes against the “100 @'s email”. Mine is about 95 times faster.

Geert
  • 1,804
  • 15
  • 15
  • 1
    Hey Geert, nice to see you around here. About the execution time of the regex: I don't think that matters much in this case, unless you're going to validate thousands of addresses in a very short time or if the address are thousands of characters long. (but we had that discussion before :)) Regards, Bart (prometheuzz). – Bart Kiers Nov 20 '09 at 22:00
  • I have to wonder how much slower the permissive regex is than the huge RFC regex, I would guess it's still much faster than that, so performance, in this case, doesn't seem like a major concern. Essentially, 95 times slow than 1 milisecond, is just 95 miliseconds, negligable. So what kind of slowdown are we talking about here? – Kzqai Mar 29 '12 at 16:10
2

What's wrong with the above RegEx?

It only checks for an '@' and a . There are plenty of things that are definitely not legitimate email addresses that would match that combination.

For example, if a person wrote user@www.myWebsite.com it would match, but it is obviously a mistake. A little more sophistication in the regex would catch it and help the user.

Ditto if he put in user@myWebsite.nt - he misspelled 'net'. Or he put in two @@'s (user@@yahoo.com / user@yahoo@yahoo.com - which is actually pretty common), or two dots (user@yahoo..com). A better regex should catch these.

[Though better checks often get stopped on other errors, such legal multiple dots before and after the 'at' that might be dropped or invalidated (my.name@gmail.com)]

If you don't want to be picky, you dont even need a regex. indexOf('@') != -1 will catch most of the errors. Once checking, you should do better.

What's a good RegEx for validating emails?

http://www.gooli.org/blog/useful-regular-expressions

http://www.regular-expressions.info/email.html

SamGoody
  • 13,758
  • 9
  • 81
  • 91
  • 4
    user@www.myWebsite.com is a valid email address. – Andrew Moore Jul 02 '09 at 20:53
  • Really?! If I do myName@www.yahoo.com will I get it? Though now that I think about it, something ending with .nt might technically be also. It is still correct to catch and verify both of the above. – SamGoody Jul 02 '09 at 20:56
  • 1
    It's quite common to have multiple dots in the second half of an email, such as johnny@corporate.example.com – Funka Jul 02 '09 at 22:23
2

I see that @liam posted a link to the RFC822. But, in keeping with the idea that stackoverflow is a destination, and incase ex-parrot.com takes down the link, or what have you. In it's entirity...

Mail::RFC822::Address: regexp-based address validation

Mail::RFC822::Address is a Perl module to validate email addresses according to the RFC 822 grammar. It provides the same functionality as RFC::RFC822::Address, but uses Perl regular expressions rather that the Parse::RecDescent parser. This means that the module is much faster to load as it does not need to compile the grammar on startup.

Download Mail::RFC822::Address-0.4.tar.gz or read the documentation.

The grammar described in RFC 822 is suprisingly complex. Implementing validation with regular expressions somewhat pushes the limits of what it is sensible to do with regular expressions, although Perl copes well:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
 \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)

This regular expression will only validate addresses that have had any comments stripped and replaced with whitespace (this is done by the module).

Community
  • 1
  • 1
Aaron Palmer
  • 8,912
  • 9
  • 48
  • 77
0

Though I'm a little bit late this could help someone who still wants to check more.

if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
  // invalid email address
}

This will only validate if the email format is correct or not. But if you want to use more than that, like if this email really exists or not. You need to use some mail validation API.

Here is the code that I'm currently using on my website.

    if(isEmialExist("EMAIL_ADDRESS_THAT"))
{
    echo "email exists, real email";
}
else
{
    echo "email doesn't exist";
}


function isEmialExist($emailAddress)
{
    if (!filter_var($emailAddress, FILTER_VALIDATE_EMAIL)) {
     return false; //invalid format
    }
    //now check if email really exist
    $postdata = http_build_query(array('api_key' => 'YOUR_API_KEY', 'email' => $emailAddress ));
    $url = "https://email-validator.com/api/v1.0/json";
    $opts = array('http' => array( 'method'  => 'POST', 'header'  => 'Content-Type: application/x-www-form-urlencoded', 'content' => $postdata ));
    
    $context  = stream_context_create($opts);
    $result = file_get_contents($url, false, $context);
    $data = json_decode($result, false);
    return $data->is_exists;
}

You can find more details here. https://email-validator.com/tutorial

Deluar Hossen
  • 595
  • 4
  • 6
-1

What's wrong with the above RegEx?
The RFC822 standard for email addresses is very loose, so it is hard to find a terse regular expression that captures all possible valid emails. The matter is complicated by the fact that not every mail server/client enforces this standard, so the actual data might be

While you can certainly guess, or enforce a particular format, writing an ad-hoc expression is pretty much a guarantee that you will have either lots of junk email addresses or deny valid ones.

What's a good RegEx for validating emails?
This is the reference regex for validating against the RFC, its implemented as a perl module here, but is also the final listing in O'Reillys "Mastering Regular Expressions"

Mail::RFC822::Address

liam
  • 3,830
  • 6
  • 32
  • 31