1

I am woking on email based authentication that checks database for existing users based on their email and decides whether to create new account or use existing one.

Issue I came across is that users sometimes use different capitalisation in their emails, append things like +1 in the middle etc...

To combat some of these I am now (1) Stripping whitespaces away from the emails (2) always lowercasing them.

I would like to take this further, but am not sure what else I am allowed to do without breaking some emails i.e.

(3) Can I remove everything after + and before @ signs? (4) Can I remove other symbols like . from the emails?

Ilja
  • 44,142
  • 92
  • 275
  • 498
  • 1
    I think this may be more complicated than the rules you're mentioning... see [this thread](https://stackoverflow.com/questions/2049502/what-characters-are-allowed-in-an-email-address) and [this wikipedia page](https://en.wikipedia.org/wiki/Email_address#Common_local-part_semantics). For example the dot '.' seems to be removed sometimes for security reasons ex. [gmail](https://support.google.com/mail/answer/7436150?hl=en&ref_topic=3394657) and the plus '+' sign seems to be dependent on the email provider. – evilmandarine Oct 10 '22 at 15:10
  • Please see [Are email addresses case sensitive?](https://stackoverflow.com/questions/9807909/are-email-addresses-case-sensitive) – Andrew Morton Oct 13 '22 at 17:15

4 Answers4

2

Email addresses are case-insensitive (A and a are treated the same), so changing all upper case to lower case is fine. Digits (0-9) are also valid for emails.

However, you should not remove any of the following characters from an email address:

!#$%&'*+-/=?^_`{|}~.

Control characters, white space and other specials are invalid.

If you discover characters not in the list of 20 characters above, they would represent an invalid email. How those are handled is undefined in the standard.

Why removing the + is an issue: It is used by some mail providers to separate (file) inbound email into folders for a user. So jack+finance@email.com would go to a finance folder in Jack's email. Other mail providers would consider it part of the email address. So jack+bauer@email.com can be a different account than jack+sparrow@email.com.

So removing the + (along with characters after it) could conflate different email accounts into an invalid email address.

James Risner
  • 5,451
  • 11
  • 25
  • 47
1

Can I remove everything after + and before @ signs? Can I remove other symbols like . from the emails?

Sure, you can - but should you?

If you don't care about standards and want to block valid email addresses, then block any characters you like.

RFC 822 - Standard for ARPA Internet Text Messages and RFC 2822 - Internet Message Format clearly specify the valid characters for email addresses.

+ is no different to x, ! or $

The local-part (before @) can contain:

  • uppercase and lowercase Latin letters (A-Z, a-z)
  • numeric values (0-9)
  • special characters, such as # ! % $ & + * \ = ? ^ _ . { | } ~ `

...and you can block x, ! or $ or indeed any of them - but again - should you?

See: https://mozilla.fandom.com/wiki/User:Me_at_work/plushaters

Fraser
  • 15,275
  • 8
  • 53
  • 104
1

No. Any manipulation along these lines is speculative at best, and harmful at worst. Some providers regard some characters as insignificant (so, for example, Gmail will famously ignore any dots in the localpart) but there is no safe generalization.

The only sane and safe way to validate an email address remains to send a message to it, and discard the address if the recipient does not respond e.g. by clicking a link in the message or replying to it within a reasonable time frame (say, 48 hours). And if you don't have any previous relationship with the owner of this mailbox, don't; then you're a spammer.

tripleee
  • 175,061
  • 34
  • 275
  • 318
-1

You can treat gmail separately. (This is what some banks do today.)

If the address is gmail, you do your items (3) and (4). (Removing the plus part and ignoring the dots before the ‘@‘ sign.). It is a good idea to warn the user at registration before removing.

For other email providers, since it is impossible to keep track how each one behaves, better to accept both the dot and plus.

Considering gmail addresses are the most frequently used ones for subscriptions, you should be OK to go for most cases.

Mehmet Kaplan
  • 1,723
  • 2
  • 20
  • 43