2

Is there a function in PHP that can normalize an email address?

E.g., if case isn't significant, then FOO@example.com -> foo@example.com.

I don't know the rules for when email addresses should be considered "the same", so I don't want to implement this myself.

allyourcode
  • 21,871
  • 18
  • 78
  • 106
  • Is there a reason to do it? It seems to me like you're not gaining anything from doing this... while at the same time risking users not receiving their emails because their mailhost is quirky. –  Dec 14 '11 at 03:36

7 Answers7

10

Wikipedia has a roundup of what the various RFCs say about how an email address should be formed.

Despite what others have said, email can be case sensitive

The local-part is case sensitive, so "jsmith@example.com" and "JSmith@example.com" may be delivered to different people. This practice is discouraged by RFC 5321. However, only the authoritative mail servers for a domain may make that decision. The only exception is for a local-part value of "postmaster" which is case insensitive, and should be forwarded to the server's administrator.

The local part is referring to the part of the address to the left of the @ sign.

So, as far as your specific concern (case normalization), you could lowercase the server portion (to the right of the @) however you best see fit (split by the @, strToLower the server component, recombine).

Alana Storm
  • 164,128
  • 91
  • 395
  • 599
  • 2
    +1 for referencing the RFC. Store and use email addresses in their original form; compare them in lowercase -- because while you don't want to accidentally fail to send someone email because you screwed up the local part, you also don't allow registration of two separate accounts for JIM@example.com and jim@example.com. – Frank Farmer Jul 23 '09 at 00:41
  • 2
    I would like to know what percentage of mail servers actually allow 2 users with the same user name but different case. My guess is close to 0. So do you account for minute amount of the mail servers and leave your system open to multiple accounts with the same email or do you make a stand and shut off those 2 (probably less) people who are stupid enough to create multiple users with the same email address? I know what I'll be doing. – Kane Wallmann Jul 23 '09 at 00:51
  • Your edge case is someone's existence. – Alana Storm Jul 23 '09 at 05:59
  • "I would like to know what percentage of mail servers actually allow 2 users with the same user name but different case" The more likely issue, is a server that accepts ONLY uppercase 'local parts'. I'm pretty sure there was at least one service that did this in the 90s -- for example, TOM@compuserve.com might have worked, but tom@compuserve.com might not have. As I outlined in my first comment, it's simple to account for this, while simultaneously not allowing multiple registrations with the "same" email address. Why risk cutting off a few users, other than out of sheer laziness? – Frank Farmer Jul 29 '09 at 00:08
4

If you want, you can use strtolower(), which could cover most of your emails correctly. But here is some additional info, if you want to do it correctly:

An email address consists of two parts: a local-part (anything before @), and a domain (anything after @). The local-part is meant to be interpreted by the mail server of the domain given in the domain part, so you actually cannot make any assumptions on that (case matters, for example!).

Many mail servers provide the option of adding arbitrary comments to your user name with a plus sign, like the following:

soulmerge+this_mail_is_delivered_to_the_user_soulmerge@example.com

For one mail server soulmerge@example.com, soulmerge+friends@example.com and SOULMERGE@example.com might be the same mail box, whereas in another it might point to two or three distinct mailboxes, but fact is: you cannot know. Any translation you make on the whole address might lead to an invalid address.

soulmerge
  • 73,842
  • 19
  • 118
  • 155
  • I like the comment feature you mentioned. I did not know that! – allyourcode Jul 23 '09 at 23:33
  • 1
    These are not comments within the RFC definition of email addresses, they are just part of the local-part that some systems treat specially. Different systems use different characters; `+`, `-` and `=` are common. Real comments in email addresses look like this: `(comment)john.smith@example.com`, and in theory you can even do crazy things like this: `john.smith@exam(comment)ple.com`, however, many systems do not support RFC822 comments. – Synchro Dec 12 '14 at 16:07
3

Use strtolower() to make the server portion lowercase. (Updated due to previous answer)

$parts = explode("@", $email);
$host = strtolower($parts[1]);
$email = $parts[0]."@".$host;

Also, if you want to standardize the format aswell, you probably want to look into filter_var(), which can sanatize/validate email addresses, along with several other formats.

First, the FILTER_SANITIZE_EMAIL will make sure that there are no illegal characters in it.

$email_sanatized = filter_var('bob@example.com', FILTER_SANITIZE_EMAIL);

Then, FILTER_VALIDATE_EMAIL will make sure it is in a valid email format

$email = filter_var($email_sanatized, FILTER_VALIDATE_EMAIL);
Synchro
  • 35,538
  • 15
  • 81
  • 104
Tyler Carter
  • 60,743
  • 20
  • 130
  • 150
  • 2
    PHP's builtin email validation filter just uses a regular expression (a very simple one, too), I really wouldn't trust it. See here for the reasons why e-mails cannot be verified with regex: http://stackoverflow.com/questions/201323/what-is-the-best-regular-expression-for-validating-email-addresses – soulmerge Jul 24 '09 at 01:13
2

this is just a complement of other answers.

in the case of gmail, I would remove the dots on the left side.

Gmail allows only one registration for any given username. Once you sign up for a username, nobody else can sign up for the same username, regardless of whether it contains extra periods or capital letters; those usernames belong to you. If you created yourusername@gmail.com, no one can ever register your.username@gmail.com, or Your.user.name@gmail.com. Because Gmail doesn't recognize dots as characters within usernames, you can add or remove the dots from a Gmail address without changing the actual destination address; they'll all go to your inbox, and only yours.

so you can sure you always have the same gmail email.

Gabriel Sosa
  • 7,897
  • 4
  • 38
  • 48
  • 1
    You shouldn't trust that... I know for a fact that another guy has the exact same gmail username as me except for a dot before the last character (he has it, I don't). During the years it has triggered a few bugs (like me receiving his mail, and probably the other way around) but both our accounts are still alive and kicking. I don't know how it happened but I think I registered first. – Fredrik Jul 23 '09 at 07:48
  • I know what you are referring to. many gmail users have reported this problem. regards – Gabriel Sosa Jul 23 '09 at 13:33
  • 1
    And I don't know if it is a good idea to start adjusting your code for every mail provider out there. Besides, gmail might remove or change that 'feature' in the future. – soulmerge Jul 24 '09 at 01:00
  • I think your implementation can be smart and tell the user if the system "detects" the gmail address is the same. Anyway all depends how big you think your db will be in terms of how you will store those emails – Gabriel Sosa Jul 24 '09 at 01:52
  • 1
    You could also account for gmail's "+" feature. myname+spamtastic@gmail.com is delivered to myname@gmail.com. But as mentioned above, the wisdom of trying to compensate for gmail's features in your code is questionable. – Frank Farmer Jul 29 '09 at 00:11
1

Trim out all whitespace, then compare with strtolower. That should be perfectly fine.

Matt Grande
  • 11,964
  • 6
  • 62
  • 89
  • Technically, no, that is not "perfectly fine". Whitespace can occur in valid email addresses, although in practice probably almost no one actually does that. – nobody Jul 23 '09 at 01:11
  • I should have specified... When I said trim, I was referring to whitespace at the start or end of the email, which isn't valid. That being said, is the quoted-string-email-address still used by any hosts? I just tried signing up for one at a few places, and it wasn't accepted as valid, and none of the form validations I found take that into account. So, what I said stands true, that *should* be fine for all but the most extreme cases. – Matt Grande Jul 23 '09 at 12:35
-1

If lowercasing is all you're looking for: strtolower().

deceze
  • 510,633
  • 85
  • 743
  • 889
  • I mentioned case just as an example. As others have pointed out, case can be significant in the local part. Whether it is for a particular address varies... – allyourcode Jul 23 '09 at 23:31
-1

EDIT: Based on another answer that states only the domain is case insensitive I've updated the function to only lowercase the domain not the user.

function NormalizeEmail( $email )
{
    list( $user, $domain ) = explode( '@', trim( $email ) );
    return $user . '@' . strtolower( $domain );
}
Kane Wallmann
  • 2,292
  • 15
  • 10