3

I have a list of email addresses and I want to check if they are a valid email address for GMail.

Possible email addresses

"admin@gmail.com"
"john.smith@googlemail.com"
"john5.a.smith@gmail.com"
"jane_doe@googlemail.com"
"patrick.o'reilly@gmail.com" 

However the following wouldn't be valid email address

".admin@gmail.com"
"postmaster.@gmail.com"

This is what I have so a string beginning with a-z or 0-9 following by zero or more of any special character.

re.search("^[a-z0-9]+[\.'\-]*[a-z0-9]+@(gmail|googlemail)\.com$", s)

but it is failing on

"john5.a.smith@gmail.com"
Eamonn
  • 75
  • 1
  • 2
  • 6
  • http://emailregex.com/ – tzaman May 12 '15 at 15:10
  • Don't you want the first part to be `[a-z0-9].*[a-z0-9]` (also consider case...)? Note that there are many issues with validating email addresses with regex, though: http://stackoverflow.com/q/201323/3001761 – jonrsharpe May 12 '15 at 15:11
  • Case is not important at the moment. [a-z0-9].*[a-z0-9] fails on ".admin@gmail.com" Gmail link fails on "admin@gmail.com" – Eamonn May 12 '15 at 15:25
  • Got this working using ((^[a-z0-9])+([a-z0-9-.]*[a-z0-9])*)+@(gmail|googlemail).com$ – Eamonn Jul 20 '15 at 14:45

2 Answers2

3

This is a tricky thing, and it's difficult or impossible to do correctly with a regular expression, as it gets out of hand quickly. You will have to weigh concerns about false-positives and -negatives when designing your filter, and make any decision based on what you prefer. It is incorrect to think that this kind of filter will work 100% of the time.

Based on your requirements, you should make a decision to:

  1. Filter aggressively, and be fine with some people not getting emails from you, or
  2. Don't filter at all, but remove addresses that bounce from the mailing list.

It, again, depends on your requirement, but I recommend not filtering. Even in cases where email reputation is a concern, unless you're sending emails to equal numbers of good and bad addresses, this is the better option.


A few points to demonstrate this fact

Unlike what you posted:

  1. admin@gmail.com is an illegal address
  2. postmaster.@gmail.com will receive mail.

This demonstrates that it is very hard to get things like this right. And that (in my opinion) you shouldn't try. Even "simple" and "obvious" things are often anything but in the Wacky World of Email®.

  1. It's important to note that dots don't matter in gmail addresses.

    Gmail doesn't recognize dots as characters within usernames, you can add or remove the dots from a Gmail address without changing the actual destination address; they'll all go to your inbox, and only yours. In short:

    homerjsimpson@gmail.com = hom.er.j.sim.ps.on@gmail.com
    homerjsimpson@gmail.com = HOMERJSIMPSON@gmail.com
    homerjsimpson@gmail.com = Homer.J.Simpson@gmail.com
    

    A quick test on my personal email has confirmed that emails with leading or trailing dots respect this principle:

    homerjsimpson@gmail.com = .homerjsimpson@gmail.com
    homerjsimpson@gmail.com = homerjsimpson.@gmail.com
    homerjsimpson@gmail.com = homerjsimpson.....@gmail.com
    

    work, and are delivered.

  2. You must distinguish between valid Gmail username, and valid Gmail address. They are not the same thing. Just because you cannot register with certain string for a username does not mean that putting that same string in front of @gmail.com won't deliver an email.

    Some other points:

    • Usernames must be at least 6 characters. This means admin@gmail.com is, in fact, an illegal address. bob@gmail.com, etc. are also illegal according to this guideline, although "obviously legal".
    • Usernames can contain letters (a-z), numbers (0-9), dashes (-), underscores (_), apostrophes ('), and periods (.) You should allow any combination of these in the username if you decide on a regex filter. And also the plus ('+'), and probably some other characters we haven't considered.
    • There are also max-length of username, total length of address constraints, and other constraints on emails in general.
    • Plus signs are not legal parts of Gmail usernames, but can be included in gmail addresses. homerjsimpson+stackoverflow@gmail.com will happily be delivered to homerjsimpson@gmail.com.
Ezra
  • 7,552
  • 1
  • 24
  • 28
  • I actually never knew this about gmail and its pretty interesting but I was using gmail/googlemail as an example its actually a different site I was trying to write this for. – Eamonn May 13 '15 at 13:36
1

Use this instead:

^[a-z0-9]+[\.'\-a-z0-9_]*[a-z0-9]+@(gmail|googlemail)\.com$

Tested on Regex101.com:

enter image description here

Rodrigo López
  • 4,039
  • 1
  • 19
  • 26