69

I'm building a website using Django. The website could have a significant number of users from non-English speaking countries.

I just want to know if there are any technical restrictions on what types of characters an email address could contain.

Are email addresses only allowed to contain English letters, numbers, _, @ and .?

Are they allowed to contain non-English alphabets like é or ü?

Are they allowed to contain Chinese or Japanese or other Unicode characters?

Daniel Böhmer
  • 14,463
  • 5
  • 36
  • 46
Continuation
  • 12,722
  • 20
  • 82
  • 106

8 Answers8

53

Email address consists of two parts local before @ and domain that goes after.

Rules to these parts are different:

For local part you can use ASCII:

  • Latin letters A - Z a - z
  • digits 0 - 9
  • special characters !#$%&'*+-/=?^_`{|}~
  • dot ., that it is not first or last, and not in sequence
  • space and "(),:;<>@[] characters are allowed with restrictions (they are only allowed inside a quoted string, a backslash or double-quote must be preceded by a backslash)

Plus since 2012 you can use international characters above U+007F, encoded as UTF-8.

Domain part is more restricted:

  • Latin letters A - Z a - z
  • digits 0 - 9
  • hyphen -, that is not first or last, multiple hyphens in sequence are allowed.

Regex to validate

^(([^<>()\[\]\.,;:\s@\"]+(\.[^<>()\[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})

Hope this saves you some time.

Community
  • 1
  • 1
Matas Vaitkevicius
  • 58,075
  • 31
  • 238
  • 265
  • 1
    Where is the application of these `domain part` restrictions? `Latin letters A - Z a - z` `digits 0 - 9` – androidguy May 10 '17 at 01:21
  • Just going to add in here @matas-vaitkevicius, RFC 6531 is **proposed** standard. It is not a complete standard just yet. – Stewart Polley May 21 '17 at 22:32
  • 1
    Regex not working in JAVA; pattern = Pattern.compile("^(([^<>()\[\]\.,;:\s@\"]+(\.[^<>()\[\]\.,;:\s@\"]+)*)|(\".+\"))@(([^<>()[\]\.,;:\s@\"]+\.)+[^<>()[\]\.,;:\s@\"]{2,})", Pattern.CASE_INSENSITIVE); – Furkan Jul 04 '17 at 13:39
  • 2
    You're right about the domain, but you could encounter unicode characters that need to be encoded with punycode – David Ehrmann Apr 21 '18 at 16:31
  • @StewartPolley Many Proposed Standards are actually deployed on the Internet and used extensively, as stable protocols. Actual practice has been that full progression through the sequence of standards levels is typically quite rare, and most popular IETF protocols remain at Proposed Standard.(https://en.wikipedia.org/wiki/Internet_Standard) – Randall Flagg Jan 04 '21 at 22:05
39

Well, yes. Read (at least) this article from Wikipedia.

I live in Argentina and here are allowed emails like ñoñó1234@server.com

eKek0
  • 23,005
  • 25
  • 91
  • 119
18

The allowed syntax in an email address is described in [RFC 3696][1], and is pretty involved.

The exact rule [for local part; the part before the '@'] is that any ASCII character, including control characters, may appear quoted, or in a quoted string. When quoting is needed, the backslash character is used to quote the following character
[...]
Without quotes, local-parts may consist of any combination of alphabetic characters, digits, or any of the special characters ! # $ % & ' * + - / = ? ^ _ ` . { | } ~
[...]
Any characters, or combination of bits (as octets), are permitted in DNS names. However, there is a preferred form that is required by most applications...

...and so on, in some depth. [1]: https://www.rfc-editor.org/rfc/rfc3696

Community
  • 1
  • 1
Michael Petrotta
  • 59,888
  • 27
  • 145
  • 179
8

Instead of worrying about what email addresses can and can't contain, which you really don't care about, test whether your setup can send them email or not—this is what you really care about! This means actually sending a verification email.

Otherwise, you can't catch a much more common case of accidental typos that stay within any character set you devise. (Quick: is random@mydomain.com a valid address for me to use at your site, or not?) It also avoids unnecessarily and gratuitously alienating any users when you tell them their perfectly valid and correct address is wrong. You still may not be able to process some addresses (this is necessary alienation), as the other answers say: email address processing isn't trivial; but that's something they need to find out if they want to provide you with an email address!

All you should check is that the user supplies some text before an @, some text after it, and the address isn't outrageously long (say 1000 characters). If you want to provide a warning ("this looks like trouble! is there a typo? double-check before continuing"), that's fine, but it shouldn't block the add-email-address process.

Of course, if you don't care to ever send email to them, then just take whatever they enter. For example, the address might solely be used for Gravatar, but Gravatar verifies all email addresses anyway.

  • 30
    It's presumptuous to tell people what they do and don't care about. (For example, since email addresses are typically case-insensitive, it's important to know whether you need to deal with Unicode or just ASCII.) – Glenn Maynard Jul 24 '13 at 14:25
5

There is a possibility to have non-ASCII email addresses, as shown by this RFC: https://www.rfc-editor.org/rfc/rfc3490 but I think this has not been set for all countries, and from what I understand only one language code will be allowed for each country, and there is also a way to turn it into ASCII, but that won't be a trivial issue.

Community
  • 1
  • 1
James Black
  • 41,583
  • 10
  • 86
  • 166
3

I have encountered email addresses with single quotes, and not infrequently either. We reject whitespace (though strictly speaking it is allowed), more than one '@' sign and address strings shorter than five characters in total. I believe this solves more problems than it creates, and so far over ten years and several hundred thousand addresses it's worked to reject many garbage addresses. Also there is a trigger to downcase all email addresses on insert or update.

That being said it is impossible to validate an email without a round trip to the owner, but at least we can reject data that is extremely suspect.

Allan Peda
  • 31
  • 1
  • 1
    Email addresses (the user part....) can be case sensitive.... (It is recommended that they are not, see [RFC5321](https://tools.ietf.org/html/rfc5321) section 2.4) You should not alter the case on addresses received.... (when used as username, it might be reasonable to ignore case though....) (Technically John@domain.com and john@domain.com can be different users...) (I know of a case years ago where a mail system required the case to match (e.g. JohnS@domain.com worked, johns@domain.com did not) for emails to reach end users...) – Gert van den Berg Jul 18 '17 at 08:03
2

I took a look at the regex in pooh17's answer and noticed it allows the local part to be greater than 64 characters if separated by periods (it just checked the bit before the first period is less than 64 characters). You can make use of positive lookahead to improve this, here's my suggestion if you're really wanting a regex for this

^(((?=.{1,64}@)[^<>()[\].,;:\s@"]+(\.[^<>()[\].,;:\s@"]+)*)|((?=.{1,66}@)".+"))@(?=.{1,255}$)(\[(IPv6:)?[\dA-Fa-f:.]+]|(?!.*?\.\.)(([^\s!"#$%&'()*+,./:;<=>?@[\]^_`{|}~]+\.?)+[^\s!"#$%&'()*+,./:;<=>?@[\]^_`{|}~]{2,}))$
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
taylor8294
  • 76
  • 6
1

Building on @Matas Vaitkevicius' answer: I've fixed up the regex some more in Python, to have it match valid email addresses as defined on this page and this page of wikipedia, using that awesome regex101 website: https://regex101.com/r/uP2oL7/26

^(([^<>()\[\]\.,;:\s@\"]{1,64}(\.[^<>()\[\]\.,;:\s@\"]+)*)|(\".+\"))@\[*(?!.*?\.\.)(([^<>()[\]\.,;\s@\"]+\.?)+[^<>()[\]\.,;\s@\"]{2,})\]?

Hope this helps someone!:)

HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
pooh17
  • 64
  • 4