838

I'm not asking about full email validation.

I just want to know what are allowed characters in user-name and server parts of email address. This may be oversimplified, maybe email adresses can take other forms, but I don't care. I'm asking about only this simple form: user-name@server (e.g. wild.wezyr@best-server-ever.com) and allowed characters in both parts.

kevinarpe
  • 20,319
  • 26
  • 127
  • 154
WildWezyr
  • 10,281
  • 6
  • 23
  • 28
  • 259
    The `+` is allowed. It drives me nuts when web sites don't allow it because my email has a `+` in it and so many sites don't allow it. – Dan Herbert Jan 12 '10 at 14:16
  • I've just started a bounty. There are already good answers but they do not explain characters allowed in server part of email address. I will accept full answer for my questions (username and server parts explained). – WildWezyr Jan 15 '10 at 08:54
  • 12
    Earlier question covering the same material: [stackoverflow.com/questions/760150/](http://stackoverflow.com/questions/760150/can-an-email-address-contain-international-non-english-characters). The sad thing is, even though that question is almost 8 months older than this one, the older question has much better answers. Almost all the answers below were already out of date when they were originally posted. See [Wikipedia entry](http://en.wikipedia.org/wiki/E-mail_address#Internationalization) (and don't worry, it has relevant [official references](http://tools.ietf.org/html/rfc6530)). – John Y Jul 20 '12 at 20:56
  • Maybe also [RFC2821 and RFC2822](http://www.remote.org/jochen/mail/info/chars.html). –  Jan 18 '12 at 09:24
  • 1
    According to PHP's `fitler_var()` validation this email would be correct: `_.-+~^*'\`{GEO}\`'*^~+-._@example.com` – Geo Mar 29 '13 at 15:11
  • 27
    Contrary to several answers, spaces *are* allowed in the local part of email addresses, if quoted. `"hello world"@example.com` is valid. – user253751 Jul 08 '14 at 06:34
  • 1
    Currently setting up a Google Dev Console email group, Google doesn't allow the + even though the email address must have been allowed when the person created the Gmail account. !!!!! – Lara Ruffle Coles Aug 15 '16 at 13:35
  • 10
    @LaraRuffleColes - For Gmail, when you create an email account, it doesn't allow you to create addresses containing a "+" sign. The "+" sign ("Plus-addressing") allows anyone with a Gmail address to add a "+" sign followed by a "string" to the end of their username to create an "alternate" ("alias") email address to use for their account. Example: "example@gmail.com", "example+tag@gmail.com". A typical (and probably "Primary") use of this is to be able to create alias email addresses for your account which allow you to tag and filter incoming email messages, theoretically filtered by sender. – Kevin Fegan Sep 26 '16 at 17:49
  • 7
    Think the '+' drives you nuts? My last name has an apostrophe in it. Know haw many websites I can still crash by entering my last name? Way too many, but on topic I gave up the email address Patrick.o'hara because almost no one allows it, thought it is valid. – Patrick O'Hara Nov 15 '17 at 18:03
  • 3
    @DanHerbert Maybe they don't want people *easily* abusing the system by using a single real email address to create multiple accounts. – Andrew Grimm Apr 06 '18 at 05:10
  • 10
    @Andrew The reverse is much more common. If a site can't be trusted to allow proper email addresses, I don't trust them to handle my personal information. – Dan Herbert Apr 06 '18 at 16:15
  • @DanHerbert because websites don't want 2 different users with the same email. Imagine they provide discounts for the first-time buyers and every time you shop you could claim you're a new customer just by adding a + gibberish after your email. Would you want that? – Amir Hassan Azimi Sep 22 '21 at 20:34
  • 3
    @HassanAzimi Trying to prevent abuse by blocking valid email address formats is not a great strategy and would stop an incredibly small number of bad actors who can get around that limitation quite easily. Plus, it isn't a universal rule that all email providers ignore everything after a `+` At the time of my original comment, it was something that only worked that way with Gmail. A lot of the larger providers now behave that way, but it's still not an effective way to stop bad behavior and is going to annoy more honest users than dishonest ones. – Dan Herbert Sep 23 '21 at 21:29
  • @DanHerbert I tried MSN, AOL, YAHOO and none of them let you add a plus anywhere in your email when creating a new email so yes it is not a valid email address. – Amir Hassan Azimi Sep 24 '21 at 12:39
  • @DanHerbert a related pet peeve of mine is sites that don't allow a single character local part of the email, e.g. a@gmail.com is completely valid but many sites don't allow it including some airline sites. – Luther Jan 13 '22 at 16:49
  • 1
    @DanHerbert yikes, I have been using the + trick since before Gmail even existed, on my own mail servers. – Hakanai Feb 21 '22 at 23:22
  • 5
    @AmirHassanAzimi - That's incorrect and flawed logic. The fact that some websites don't let you create an account with a `+` character in it does not in any way mean that `+` "is not a valid e-mail address". They still accept, process, and work with e-mail that have `+` in it because it *is* valid. Sites that disallow it are adding a restriction in order to make special use of (valid) e-mail addresses containing `+`. – Christopher Cashell Mar 03 '22 at 23:35
  • @ChristopherCashell try to look at the company's perspective meaning having 1 email means you can forge multiple emails. It is up to you/company to accept that or not but I already explained why it's bad practice. – Amir Hassan Azimi Mar 04 '22 at 11:42
  • 1
    @AmirHassanAzimi I have my own domain and my own email server. I could easily create hundreds, thousands, *millions* of fake addresses and a company would see nothing. Domains are cheap. Servers are cheap. The "plus is FORBIDDEN" only annoys. – Jürgen A. Erhard Jan 21 '23 at 01:22
  • @JürgenA.Erhard it’s not about your domain and hundreds of emails it’s about having plus in your email address. Second of all you wound want to have a fake server because there are disposable domain addresses everywhere and they will put your domain in their list. So it’s up to you. You can have a subdomain and start creating ie bank.yourdomain.com and see how other companies block you! – Amir Hassan Azimi Jan 22 '23 at 02:45
  • @Luther also sites which don't allow a single-character subdomain, E.g. my.name@a.domain.com – zsalya Mar 23 '23 at 12:48
  • The standards people have let this get way out of hand. They should have only used 128 characters. Instead we've have to spend a load of money trying to track spammers with @Microsoft.com where an 'o' is some international character that isn't 'o' but looks almost exactly like it. – AriesConnolly Jul 07 '23 at 09:29

18 Answers18

985

See RFC 5322: Internet Message Format and, to a lesser extent, RFC 5321: Simple Mail Transfer Protocol.

RFC 822 also covers email addresses, but it deals mostly with its structure:

 addr-spec   =  local-part "@" domain        ; global address     
 local-part  =  word *("." word)             ; uninterpreted
                                             ; case-preserved
 
 domain      =  sub-domain *("." sub-domain)     
 sub-domain  =  domain-ref / domain-literal     
 domain-ref  =  atom                         ; symbolic reference

And as usual, Wikipedia has a decent article on email addresses:

The local-part of the email address may use any of these ASCII characters:

  • uppercase and lowercase Latin letters A to Z and a to z;
  • digits 0 to 9;
  • special characters !#$%&'*+-/=?^_`{|}~;
  • dot ., provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g. John..Doe@example.com is not allowed but "John..Doe"@example.com is allowed);
  • space and "(),:;<>@[\] characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);
  • comments are allowed with parentheses at either end of the local-part; e.g. john.smith(comment)@example.com and (comment)john.smith@example.com are both equivalent to john.smith@example.com.

In addition to ASCII characters, as of 2012 you can use international characters above U+007F, encoded as UTF-8 as described in the RFC 6532 spec and explained on Wikipedia. Note that as of 2019, these standards are still marked as Proposed, but are being rolled out slowly. The changes in this spec essentially added international characters as valid alphanumeric characters (atext) without affecting the rules on allowed & restricted special characters like !# and @:.

For validation, see Using a regular expression to validate an email address.

The domain part is defined as follows:

The Internet standards (Request for Comments) for protocols mandate that component hostname labels may contain only the ASCII letters a through z (in a case-insensitive manner), the digits 0 through 9, and the hyphen (-). The original specification of hostnames in RFC 952, mandated that labels could not start with a digit or with a hyphen, and must not end with a hyphen. However, a subsequent specification (RFC 1123) permitted hostname labels to start with digits. No other symbols, punctuation characters, or blank spaces are permitted.

Community
  • 1
  • 1
Anton Gogolev
  • 113,561
  • 39
  • 200
  • 288
  • 24
    @WildWzyr, It's not that simple. Email addresses have a lot of rules for what is allowed. It's simpler to refer to the spec than to list out all of them. If you want the complete Regex, check here to get an idea of why it's not so simple: http://www.regular-expressions.info/email.html – Dan Herbert Jan 12 '10 at 14:20
  • 6
    there is no simple list, just because you want something simple doesn't mean it will be so. some characters can only be in certain locations and not in others. you can't have what you want all the time. –  Jan 12 '10 at 14:28
  • 16
    @WildWezyr Well, the full-stop character is allowed in the local-part. But not at the start or end. Or with another full-stop. So the answer IS NOT as simple as just a list of allowed characters, there are rules as to how those characters may be used - `.ann..other.@example.com` is not a valid email address, but `ann.other@example.com` is, even though both use the same characters. – Mark Pim Jan 12 '10 at 14:30
  • 14
    Also, remember that with internationalized domain names coming in, the list of allowed characters will explode. – Chinmay Kanchi Jan 12 '10 at 15:18
  • 53
    This is no longer the valid answer, due to internationalized addresses. See Mason's answer. – ZacharyP Dec 06 '11 at 17:59
  • 5
    Real-world anecdote: there are addresses out there that use consecutive dots (in violation of the RFC, I think). This just came up recently when I assisted with a technical audit of a corporate emergency notification system; in an annual drill, the system had silently failed to notify one employee. It turns out that NTT Docomo, Japan's largest cellular carrier, allows email address like "burgers...mmm@docomo.ne.jp". The system was choking on that address. (Docomo has more than 40 million customers.) – Mason Jan 19 '12 at 00:46
  • @ZacharyP: Actually Mason's answer doesn't go far enough either. UTF-8 is now officially allowed anywhere in the address. (See my comment on the main question.) – John Y Jul 20 '12 at 20:58
  • 1
    @AntonGogolev Don't the special characters have to appear within quotes in the local part to be valid? So `john'doe@place.com` is INVALID but `"john'doe"@place.com` is VALID. – Don Rhummy Feb 08 '13 at 18:41
  • Those are very annoying characters to read in an email address indeed. – Fabián Aug 12 '13 at 15:57
  • RFC6530 http://tools.ietf.org/html/rfc6530 does support international characters. So allowed ones go beyond just standard ASCII. – hardywang Mar 12 '14 at 16:08
  • Not according to Google Mail. http://imgur.com/hX5W2T7 - I bet Gmail wont even accept emails with apostophes in them and to be hones in 25 years, since the days of Dial Up Buliten boards, I have not once seen an email with an apostophe. – Piotr Kula Jul 15 '14 at 14:08
  • @ppumkin A given email provider can be as restrictive as they want, but that has no bearing on providers generally. – Nolan Amy Nov 19 '14 at 23:35
  • python's smptlib is not allowing a "!" in the local-part of the address, anything!@something.com will throw SMTPRecipientsRefused (550, 'restricted characters in address') – radtek Dec 17 '14 at 16:29
  • 1
    Here's the newest definition, from the HTML5 spec (not an RFC): http://www.w3.org/TR/html5/forms.html#valid-e-mail-address . – Noyo Oct 05 '15 at 17:32
  • Gmail doesn't like the comments `john.smith(comment)@example.com` – Anentropic Jul 25 '17 at 09:11
  • 1
    need to read [Mason's answer](https://stackoverflow.com/a/2071250/491243) first before implementing validation otherwise non-english email address will always be rejected. ex `夏明@域通联达。在线` – John Woo Apr 05 '18 at 14:10
  • @Anton Gogolev: you have a mistake in: "!#$%&'*+-/=?^_`{|}~;" tje last character is forbidden in atext. – John Boe Oct 18 '19 at 13:32
  • Can anyone change the text to remove the ";" from the end of !#$%&'*+-/=?^_`{|}~; – John Boe Oct 18 '19 at 13:36
  • I am looking an an xml based content management system that uses DITA. If you try to use "/" in even the first part of an email address (before the @), this creates DITA errors that can interfere with publishing the hyperlink to the email address. For compatibility with systems that may need to use the address, you might want to eliminate "/" as an allowed character. – TMWP Dec 03 '19 at 22:09
  • Wait is `somename@[123.123.123.ZZ:25]` a legal address? According to the regex and fsm it is? Even though the final part of the IP are not numeric. – paul23 May 11 '21 at 11:36
390

Watch out! There is a bunch of knowledge rot in this thread (stuff that used to be true and now isn't).

To avoid false-positive rejections of actual email addresses in the current and future world, and from anywhere in the world, you need to know at least the high-level concept of RFC 3490, "Internationalizing Domain Names in Applications (IDNA)". I know folks in US and A often aren't up on this, but it's already in widespread and rapidly increasing use around the world (mainly the non-English dominated parts).

The gist is that you can now use addresses like mason@日本.com and wildwezyr@fahrvergnügen.net. No, this isn't yet compatible with everything out there (as many have lamented above, even simple qmail-style +ident addresses are often wrongly rejected). But there is an RFC, there's a spec, it's now backed by the IETF and ICANN, and--more importantly--there's a large and growing number of implementations supporting this improvement that are currently in service.

I didn't know much about this development myself until I moved back to Japan and started seeing email addresses like hei@やる.ca and Amazon URLs like this:

http://www.amazon.co.jp/エレクトロニクス-デジタルカメラ-ポータブルオーディオ/b/ref=topnav_storetab_e?ie=UTF8&node=3210981

I know you don't want links to specs, but if you rely solely on the outdated knowledge of hackers on Internet forums, your email validator will end up rejecting email addresses that non-English-speaking users increasingly expect to work. For those users, such validation will be just as annoying as the commonplace brain-dead form that we all hate, the one that can't handle a + or a three-part domain name or whatever.

So I'm not saying it's not a hassle, but the full list of characters "allowed under some/any/none conditions" is (nearly) all characters in all languages. If you want to "accept all valid email addresses (and many invalid too)" then you have to take IDN into account, which basically makes a character-based approach useless (sorry), unless you first convert the internationalized email addresses (dead since September 2015, used to be like this—a working alternative is here) to Punycode.

After doing that you can follow (most of) the advice above.

David Veszelovszki
  • 2,574
  • 1
  • 24
  • 23
Mason
  • 5,071
  • 4
  • 25
  • 24
  • Are you sure that this extra characters are sent to and handled by servers? As far as I know internationalized domain names are handled by browsers (protocol clients not servers). – WildWezyr Jan 15 '10 at 13:05
  • 20
    Right; behind the scenes, the domain names are still just ASCII. But, if your web app or form accepts user-entered input, then it needs to perform the same job that the web browser or mail client does when the user inputs an IDN hostname: to convert the user input into DNS-compatible form. Then validate. Otherwise, these internationalized email addresses will not pass your validation. (Converters like the one I linked to only modify the non-ASCII characters they are given, so it is safe to use them on non-internationalized email addresses (those are just returned unmodified).) – Mason Jan 15 '10 at 13:55
  • 1
    You're right that the other answers here have outdated information. And it's not only the domain, the whole address can be UTF-8. (See my comment on the main question for further references.) – John Y Jul 20 '12 at 20:58
  • 3
    **For Javascript devs**, I'm now researching methods of doing this, and [**Punycode.js**](http://bram.us/2011/11/29/punycode-js) seems to be the most complete and polished solution. – wwaawaw Oct 07 '12 at 07:41
  • 5
    Note that Internationalized Email (as currently defined) *does not* convert non-ASCII addresses using punycode or similar, instead extending large portions of the SMTP protocol itself to use UTF8. – IMSoP May 26 '14 at 21:30
  • 4
    Am I missing something or does this fail to answer the question? I am reading 'the other answer is wrong, you need to accept more characters' but then fails to state which extra characters. I also couldn't (easily) see in that RFC whether it means all Unicode code points or just the BMP. – Samuel Harmer Feb 05 '17 at 18:32
  • 3
    This seems to be on the right track to being the correct answer. I bet it would get a lot more votes if you included specifics about reserved and allowed characters. – Sean Mar 17 '17 at 18:20
  • This post, while useful, and probably old enough that it shouldn't be deleted, feels more like a comment than an answer. – Andrew Grimm Mar 23 '18 at 00:35
  • @wwaawaw (this might or might not be useful) in Chrome 77 and Nodejs v10 creating a URL instance will convert to Punycode (no library required) automatically. For example `new URL("https://はじめよう.みんな")` outputs "xn--p8j9a0d9c9a.xn--q9jyb4c". I just started digging around and haven't figured out the reverse yet, if possible...simply noticed location.href prints out the correct punycode. – jimmont Sep 17 '19 at 00:34
104

The format of e-mail address is: local-part@domain-part (max. 64@255 characters, no more 256 in total).

The local-part and domain-part could have different set of permitted characters, but that's not all, as there are more rules to it.

In general, the local part can have these ASCII characters:

  • lowercase Latin letters: abcdefghijklmnopqrstuvwxyz,
  • uppercase Latin letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ,
  • digits: 0123456789,
  • special characters: !#$%&'*+-/=?^_`{|}~,
  • dot: . (not first or last character or repeated unless quoted),
  • space punctuations such as: "(),:;<>@[\] (with some restrictions),
  • comments: () (are allowed within parentheses, e.g. (comment)john.smith@example.com).

Domain part:

  • lowercase Latin letters: abcdefghijklmnopqrstuvwxyz,
  • uppercase Latin letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ,
  • digits: 0123456789,
  • hyphen: - (not first or last character),
  • can contain IP address surrounded by square brackets: jsmith@[192.168.2.1] or jsmith@[IPv6:2001:db8::1].

These e-mail addresses are valid:

  • prettyandsimple@example.com
  • very.common@example.com
  • disposable.style.email.with+symbol@example.com
  • other.email-with-dash@example.com
  • x@example.com (one-letter local part)
  • "much.more unusual"@example.com
  • "very.unusual.@.unusual.com"@example.com
  • "very.(),:;<>[]\".VERY.\"very@\ \"very\".unusual"@strange.example.com
  • example-indeed@strange-example.com
  • admin@mailserver1 (local domain name with no top-level domain)
  • #!$%&'*+-/=?^_`{}|~@example.org
  • "()<>[]:,;@\\"!#$%&'-/=?^_`{}| ~.a"@example.org
  • " "@example.org (space between the quotes)
  • example@localhost (sent from localhost)
  • example@s.solutions (see the List of Internet top-level domains)
  • user@com
  • user@localserver
  • user@[IPv6:2001:db8::1]

And these examples of invalid:

  • Abc.example.com (no @ character)
  • A@b@c@example.com (only one @ is allowed outside quotation marks)
  • a"b(c)d,e:f;gi[j\k]l@example.com (none of the special characters in this local part are allowed outside quotation marks)
  • just"not"right@example.com (quoted strings must be dot separated or the only element making up the local part)
  • this is"not\allowed@example.com (spaces, quotes, and backslashes may only exist when within quoted strings and preceded by a backslash)
  • this\ still\"not\allowed@example.com (even if escaped (preceded by a backslash), spaces, quotes, and backslashes must still be contained by quotes)
  • john..doe@example.com (double dot before @); (with caveat: Gmail lets this through)
  • john.doe@example..com (double dot after @)
  • a valid address with a leading space
  • a valid address with a trailing space

Source: Email address at Wikipedia


Perl's RFC2822 regex for validating emails:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ 
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
31]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\
](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+
(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:
(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)
?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\
r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[
 \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)
?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t]
)*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[
 \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*
)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
)+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)
*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+
|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r
\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:
\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t
]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031
]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](
?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?
:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?
:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?
:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?
[ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|
\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>
@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"
(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?
:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[
\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-
\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(
?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;
:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([
^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\"
.\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\
]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\
[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\
r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] 
\000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]
|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \0
00-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\
.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,
;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?
:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[
^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]
]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(
?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(
?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[
\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t
])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t
])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?
:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|
\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:
[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\
]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)
?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["
()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)
?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>
@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[
 \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,
;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t]
)*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\
".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?
(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".
\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:
\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\[
"()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])
*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])
+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\
.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z
|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(
?:\r\n)?[ \t])*))*)?;\s*)

The full regexp for RFC2822 addresses was a mere 3.7k.

See also: RFC 822 Email Address Parser in PHP.


The formal definitions of e-mail addresses are in:

  • RFC 5322 (sections 3.2.3 and 3.4.1, obsoletes RFC 2822), RFC 5321, RFC 3696,
  • RFC 6531 (permitted characters).

Related:

kenorb
  • 155,785
  • 88
  • 678
  • 743
  • 21
    As an extra caution to would-be implementers of this regex: Don't. Just verify that it folows the format `something@something.something` and call it a day. – Chris Sobolewski Sep 07 '17 at 14:41
  • 1
    While something like this is not maintainable, it is a nice exercise to decode and actually figure out what it does – unjankify Feb 27 '18 at 16:53
  • @ChrisSobolewski allow multiple somethings both sides of the '@' – Jasen May 23 '18 at 05:09
  • I've tried to implement this in postfix via pcre access table under a check_recipient_access restriction, first turning the 3 long pcres (from the linked page) into one line each and topping and tailing thus: /^[...pcre..]$/ DUNNO, then adding a final line /.*/ REJECT, but it still allows through invalid email addresses. Postfix 3.3.0; perl 5, version 26, subversion 1 (v5.26.1). – scoobydoo Aug 04 '18 at 06:11
  • "(double dot before @); (with caveat: Gmail lets this through)" - no longer true, Gmail now rejects double dot addresses :( – Laszlo Valko May 27 '19 at 14:10
  • 9
    Madness I say. Who would ever use it in production. There is a point where regular expression should no longer be used. It is far beyond that point. – tomuxmon Jun 14 '19 at 07:18
  • 3
    Something I see a lot is "validate according to RFC822". This isn't actually what's usually needed. RFC822 doesn't define addresses that can be *sent to*; it defines addresses that can *appear in messages*, which is not the same thing. Addresses that can be sent to is determined in RFC821 (SMTP) and follow-on standards. In particular this spec does not allow comments, excluding addresses like `a@abc(bananas)def.com` that are valid RFC822 addresses but can't be sent to. For this reason, many email validators are validating against the wrong thing. – Synchro Sep 18 '20 at 09:02
  • Is the symbol ª valid for an email address? – rasputino Dec 02 '21 at 19:45
  • The list of characters that this answer gives for the domain part is actually the list of characters allowed in each DNS label (and the constraint about hyphens `-` not being first or last applies to each label). The domain part is made of one or several DNS labels separated with dots `.` . – Maëlan Jun 08 '22 at 16:26
26

Wikipedia has a good article on this, and the official spec is here. From Wikipdia:

The local-part of the e-mail address may use any of these ASCII characters:

  • Uppercase and lowercase English letters (a-z, A-Z)
  • Digits 0 to 9
  • Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
  • Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.

Additionally, quoted-strings (ie: "John Doe"@example.com) are permitted, thus allowing characters that would otherwise be prohibited, however they do not appear in common practice. RFC 5321 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".

Community
  • 1
  • 1
Mike Weller
  • 45,401
  • 15
  • 131
  • 151
17

You can start from wikipedia article:

  • Uppercase and lowercase English letters (a-z, A-Z)
  • Digits 0 to 9
  • Characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~
  • Character . (dot, period, full stop) provided that it is not the first or last character, and provided also that it does not appear two or more times consecutively.
Vladimir
  • 170,431
  • 36
  • 387
  • 313
14

Google do an interesting thing with their gmail.com addresses. gmail.com addresses allow only letters (a-z), numbers, and periods(which are ignored).

e.g., pikachu@gmail.com is the same as pi.kachu@gmail.com, and both email addresses will be sent to the same mailbox. PIKACHU@gmail.com is also delivered to the same mailbox.

So to answer the question, sometimes it depends on the implementer on how much of the RFC standards they want to follow. Google's gmail.com address style is compatible with the standards. They do it that way to avoid confusion where different people would take similar email addresses e.g.

*** gmail.com accepting rules ***
d.oy.smith@gmail.com   (accepted)
d_oy_smith@gmail.com   (bounce and account can never be created)
doysmith@gmail.com     (accepted)
D.Oy'Smith@gmail.com   (bounce and account can never be created)

The wikipedia link is a good reference on what email addresses generally allow. http://en.wikipedia.org/wiki/Email_address

gaoithe
  • 4,218
  • 3
  • 30
  • 38
Angel Koh
  • 12,479
  • 7
  • 64
  • 91
  • 3
    Yea this is a great answer about why Gmail does not allow to CREATE emails with this. But you can send and recieve emails from `{john'doe}@my.server` with no problem. Tested with hMail server too. – Piotr Kula Jul 15 '14 at 14:25
  • You can test your client by sending an email to `{piotr'kula}@kula.solutions` - If it works you will get a nice auto reply form it. Otherwise nothing will happen. – Piotr Kula Jul 15 '14 at 14:29
  • 3
    Gmail does follow RFC 6530 in the sense that every possible e-mail address allowed by Gmail is valid according to the RFC. Gmail just chooses to further restrict the set of allowable addresses with additional rules, and to make otherwise similar addresses with dots in the local part, optionally followed by "+" and alphanumeric characters, synonymous. – Teemu Leisti Jan 20 '15 at 13:03
  • Google limits the account creation criteria... I imagine they scrub the incoming email account string of the extra "punctuation" and trailing plus prepended alias string sign so that the mail can be routed to the proper account. Easy peasy. In doing so, they effectively don't allow people to create just-bein-a-jerk email addresses so that valid addresses created will often pass simple and most complex validations. – BradChesney79 Feb 08 '18 at 20:41
  • It's not just gmail, Some providers have "relaying filters" that reject certain quoted strings, particularly containing "=" as if they were delimiters. This is to block users from setting up gateways and nesting spam addresses in the private quoted string. "@" is valid but "=@=" is not (considered) valid. – mckenzm Dec 06 '18 at 02:21
13

The accepted answer refers to a Wikipedia article when discussing the valid local-part of an email address, but Wikipedia is not an authority on this.

IETF RFC 3696 is an authority on this matter, and should be consulted at section 3. Restrictions on email addresses on page 5:

Contemporary email addresses consist of a "local part" separated from a "domain part" (a fully-qualified domain name) by an at-sign ("@"). The syntax of the domain part corresponds to that in the previous section. The concerns identified in that section about filtering and lists of names apply to the domain names used in an email context as well. The domain name can also be replaced by an IP address in square brackets, but that form is strongly discouraged except for testing and troubleshooting purposes.

The local part may appear using the quoting conventions described below. The quoted forms are rarely used in practice, but are required for some legitimate purposes. Hence, they should not be rejected in filtering routines but, should instead be passed to the email system for evaluation by the destination host.

The exact rule is that any ASCII character, including control characters, may appear quoted, or in a quoted string. When quoting is needed, the backslash character is used to quote the following character. For example

  Abc\@def@example.com

is a valid form of an email address. Blank spaces may also appear, as in

  Fred\ Bloggs@example.com

The backslash character may also be used to quote itself, e.g.,

  Joe.\\Blow@example.com

In addition to quoting using the backslash character, conventional double-quote characters may be used to surround strings. For example

  "Abc@def"@example.com

  "Fred Bloggs"@example.com

are alternate forms of the first two examples above. These quoted forms are rarely recommended, and are uncommon in practice, but, as discussed above, must be supported by applications that are processing email addresses. In particular, the quoted forms often appear in the context of addresses associated with transitions from other systems and contexts; those transitional requirements do still arise and, since a system that accepts a user-provided email address cannot "know" whether that address is associated with a legacy system, the address forms must be accepted and passed into the email environment.

Without quotes, local-parts may consist of any combination of alphabetic characters, digits, or any of the special characters

  ! # $ % & ' * + - / = ?  ^ _ ` . { | } ~

period (".") may also appear, but may not be used to start or end the local part, nor may two or more consecutive periods appear. Stated differently, any ASCII graphic (printing) character other than the at-sign ("@"), backslash, double quote, comma, or square brackets may appear without quoting. If any of that list of excluded characters are to appear, they must be quoted. Forms such as

  user+mailbox@example.com

  customer/department=shipping@example.com

  $A12345@example.com

  !def!xyz%abc@example.com

  _somename@example.com

are valid and are seen fairly regularly, but any of the characters listed above are permitted.

As others have done, I submit a regex that works for both PHP and JavaScript to validate email addresses:

/^[a-z0-9!'#$%&*+\/=?^_`{|}~-]+(?:\.[a-z0-9!'#$%&*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-zA-Z]{2,}$/i
Mac
  • 1,432
  • 21
  • 27
  • 1
    Technically e-mail addresses are allowed also in TLD, so you may consider changing last group's `+` to `*`. OTOH, technically valid address in TLD may be a blank typo, as these addresses are far less common. Spec vs. life… ;-) – Cromax Mar 07 '23 at 17:06
  • @Cromax - exactly. There has to be a happy medium between what we CAN do vs. what is USUALLY done. – Mac Mar 10 '23 at 22:40
10

Check for @ and . and then send an email for them to verify.

I still can't use my .name email address on 20% of the sites on the internet because someone screwed up their email validation, or because it predates the new addresses being valid.

Richard Maxwell
  • 508
  • 5
  • 7
  • 9
    Even . isn't strictly necessary; I've heard of at least one case of an email address at a top level domain (specifically ua). The address was @ua -- no dot! –  Nov 28 '13 at 01:37
  • This is pretty much the easiest way not to mess up your validation, because almost everything is allowed, and if something isn't allowed, the recipient's server will let you know. – Avamander Jan 13 '18 at 11:16
9

Name:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!#$%&'*+-/=?^_`{|}~.

Server:

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-.
ThinkingStiff
  • 64,767
  • 30
  • 146
  • 239
8

The short answer is that there are 2 answers. There is one standard for what you should do. ie behaviour that is wise and will keep you out of trouble. There is another (much broader) standard for the behaviour you should accept without making trouble. This duality works for sending and accepting email but has broad application in life.

For a good guide to the addresses you create; see: https://www.jochentopf.com/email/chars.html

To filter valid emails, just pass on anything comprehensible enough to see a next step. Or start reading a bunch of RFCs, caution, here be dragons.

ygoe
  • 18,655
  • 23
  • 113
  • 210
Michael JAMES
  • 89
  • 1
  • 1
  • 1
    The link is gone. What content was there? – ygoe May 26 '19 at 13:49
  • @ygoe yeah site is down, Here is the archive version from ~2012 : http://web.archive.org/web/20120807105804/https://www.remote.org/jochen/mail/info/chars.html – MilMike Jun 18 '21 at 11:11
  • @MilMike Thank you, from there I found the new URL of that page and edited the answer. – ygoe Jun 23 '21 at 08:50
5

A good read on the matter.

Excerpt:

These are all valid email addresses!

"Abc\@def"@example.com
"Fred Bloggs"@example.com
"Joe\\Blow"@example.com
"Abc@def"@example.com
customer/department=shipping@example.com
\$A12345@example.com
!def!xyz%abc@example.com
_somename@example.com
Billal Begueradj
  • 20,717
  • 43
  • 112
  • 130
Luke Madhanga
  • 6,871
  • 2
  • 43
  • 47
  • 1
    I was wondering about the '@' before the domain part. Can that be used? – Saiyaff Farouk Mar 16 '17 at 12:09
  • @SaiyaffFarouk according to the specification, yes. However, most mail providers likely won't allow it as part of their own validation – Luke Madhanga Mar 16 '17 at 20:04
  • that blog lists `Joe.\\Blow@example.com` without quotes. Is this actually valid ? It doesn't seem clear given the answers here, but I'm asking because I have seen (very rare) cases of DNS SoA rname email strings that contain backslashes. – wesinat0r Apr 22 '20 at 15:35
1

A lot many have already attempted answering this question. A lot many have also said that many answers are already outdated. Here is my answer, as things stand in 2022.

The answer to the question is obviously not as simple as it has been posed. The proposed standards when it comes to naming of a mailbox name, to be specific, <user-name> in this context, alongwith the interpretations of those RFCs are far and many.

For the <user-name> part, Universal Acceptance Steering Group has put up a detailed guideline as to what all constitute an e-mail ID local part in a document titled UASG-028 here.

For the <server> part, all the characters mentioned herein "The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)" with the character status "PVALID". Also, the characters with status as "CONTEXTJ" and "CONTEXTO" are valid in certain contexual conditions.

ThinkTrans
  • 21
  • 3
0

As can be found in this Wikipedia link

The local-part of the email address may use any of these ASCII characters:

  • uppercase and lowercase Latin letters A to Z and a to z;

  • digits 0 to 9;

  • special characters !#$%&'*+-/=?^_`{|}~;

  • dot ., provided that it is not the first or last character unless quoted, and provided also that it does not appear consecutively unless quoted (e.g. John..Doe@example.com is not allowed but "John..Doe"@example.com is allowed);

  • space and "(),:;<>@[\] characters are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in addition, a backslash or double-quote must be preceded by a backslash);

  • comments are allowed with parentheses at either end of the local-part; e.g. john.smith(comment)@example.com and (comment)john.smith@example.com are both equivalent to john.smith@example.com.

In addition to the above ASCII characters, international characters above U+007F, encoded as UTF-8, are permitted by RFC 6531, though mail systems may restrict which characters to use when assigning local-parts.

A quoted string may exist as a dot separated entity within the local-part, or it may exist when the outermost quotes are the outermost characters of the local-part (e.g., abc."defghi".xyz@example.com or "abcdefghixyz"@example.com are allowed. Conversely, abc"defghi"xyz@example.com is not; neither is abc\"def\"ghi@example.com). Quoted strings and characters however, are not commonly used. RFC 5321 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".

The local-part postmaster is treated specially—it is case-insensitive, and should be forwarded to the domain email administrator. Technically all other local-parts are case-sensitive, therefore jsmith@example.com and JSmith@example.com specify different mailboxes; however, many organizations treat uppercase and lowercase letters as equivalent.

Despite the wide range of special characters which are technically valid; organisations, mail services, mail servers and mail clients in practice often do not accept all of them. For example, Windows Live Hotmail only allows creation of email addresses using alphanumerics, dot (.), underscore (_) and hyphen (-). Common advice is to avoid using some special characters to avoid the risk of rejected emails.

Community
  • 1
  • 1
Yash Patel
  • 56
  • 4
0

The answer is (almost) ALL (7-bit ASCII).
If the inclusion rules is "...allowed under some/any/none conditions..."

Just by looking at one of several possible inclusion rules for allowed text in the "domain text" part in RFC 5322 at the top of page 17 we find:

dtext          =   %d33-90 /          ; Printable US-ASCII
                   %d94-126 /         ;  characters not including
                   obs-dtext          ;  "[", "]", or "\"

the only three missing chars in this description are used in domain-literal [], to form a quoted-pair \, and the white space character (%d32). With that the whole range 32-126 (decimal) is used. A similar requirement appear as "qtext" and "ctext". Many control characters are also allowed/used. One list of such control chars appears in page 31 section 4.1 of RFC 5322 as obs-NO-WS-CTL.

obs-NO-WS-CTL  =   %d1-8 /            ; US-ASCII control
                   %d11 /             ;  characters that do not
                   %d12 /             ;  include the carriage
                   %d14-31 /          ;  return, line feed, and
                   %d127              ;  white space characters

All this control characters are allowed as stated at the start of section 3.5:

.... MAY be used, the use of US-ASCII control characters (values
     1 through 8, 11, 12, and 14 through 31) is discouraged ....

And such an inclusion rule is therefore "just too wide". Or, in other sense, the expected rule is "too simplistic".

Community
  • 1
  • 1
-1

In my PHP I use this check

<?php
if (preg_match(
'/^(?:[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+\.)*[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+@(?:(?:(?:[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!\.)){0,61}[a-zA-Z0-9_-]?\.)+[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!$)){0,61}[a-zA-Z0-9_]?)|(?:\[(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\]))$/',
"tim'qqq@gmail.com"        
)){
    echo "legit email";
} else {
    echo "NOT legit email";
}
?>

try it yourself http://phpfiddle.org/main/code/9av6-d10r

Yevgeniy Afanasyev
  • 37,872
  • 26
  • 173
  • 191
-1

For simplicity's sake, I sanitize the submission by removing all text within double quotes and those associated surrounding double quotes before validation, putting the kibosh on email address submissions based on what is disallowed. Just because someone can have the John.."The*$hizzle*Bizzle"..Doe@whatever.com address doesn't mean I have to allow it in my system. We are living in the future where it maybe takes less time to get a free email address than to do a good job wiping your butt. And it isn't as if the email criteria are not plastered right next to the input saying what is and isn't allowed.

I also sanitize what is specifically not allowed by various RFCs after the quoted material is removed. The list of specifically disallowed characters and patterns seems to be a much shorter list to test for.

Disallowed:

    local part starts with a period ( .account@host.com )
    local part ends with a period   ( account.@host.com )
    two or more periods in series   ( lots..of...dots@host.com )
    &’`*|/                          ( some&thing`bad@host.com )
    more than one @                 ( which@one@host.com )
    :%                              ( mo:characters%mo:problems@host.com )

In the example given:

John.."The*$hizzle*Bizzle"..Doe@whatever.com --> John..Doe@whatever.com

John..Doe@whatever.com --> John.Doe@whatever.com

Sending a confirm email message to the leftover result upon an attempt to add or change the email address is a good way to see if your code can handle the email address submitted. If the email passes validation after as many rounds of sanitization as needed, then fire off that confirmation. If a request comes back from the confirmation link, then the new email can be moved from the holding||temporary||purgatory status or storage to become a real, bonafide first-class stored email.

A notification of email address change failure or success can be sent to the old email address if you want to be considerate. Unconfirmed account setups might fall out of the system as failed attempts entirely after a reasonable amount of time.

I don't allow stinkhole emails on my system, maybe that is just throwing away money. But, 99.9% of the time people just do the right thing and have an email that doesn't push conformity limits to the brink utilizing edge case compatibility scenarios. Be careful of regex DDoS, this is a place where you can get into trouble. And this is related to the third thing I do, I put a limit on how long I am willing to process any one email. If it needs to slow down my machine to get validated-- it isn't getting past the my incoming data API endpoint logic.

Edit: This answer kept on getting dinged for being "bad", and maybe it deserved it. Maybe it is still bad, maybe not.

BradChesney79
  • 650
  • 7
  • 16
  • 2
    I thing this answer is downvoted because this is an opinion, and it actually does not answer the question. Besides, users who get their email address silently sanitized will never get emails from you. You'd better inform them that their email address is not accepted. – vcarel Jun 07 '18 at 16:18
  • 2
    I suspect the downvotes are because there are too many ideas here. The disallowed list, while these are useful unit tests, should be prefaced with what is allowed. The programming approach seems relatively fine, but, would probably fit better after you list the specs you're working with, etc.. Sections and mild copy-editing would help. Just my 2cents. – HoldOffHunger Sep 10 '18 at 20:13
  • @vcarel - Oh, absolutely. Front-end user side validation would inform them what rules (available from the tooltip) they were breaking. You are right-- it is an overall opinion. However, the question above is from someone that is asking X for a Y question for sure. This is guidance and it works... not only does it work, it works well. I don't let bullshit email addresses in my systems where I make the decisions. – BradChesney79 Sep 11 '18 at 13:25
  • @HoldOffHunger I can see that the overall idea is not as coherently expressed as it could be, I may revise on another day where I have more time to better express that. Thanks for the insight. – BradChesney79 Sep 11 '18 at 13:27
-2

I created this regex according to RFC guidelines:

^[\\w\\.\\!_\\%#\\$\\&\\'=\\?\\*\\+\\-\\/\\^\\`\\{\\|\\}\\~]+@(?:\\w+\\.(?:\\w+\\-?)*)+$
Mau
  • 17
  • 3
  • 1
    This version improves the regex by checking the length of domain/subdomains. Enjoy! ^[\\w\\.\\!_\\%#\\$\\&\\'=\\?\\*\\+\\-\\/\\^\\`\\{\\|\\}\\~]+@(?:[\\w](?:[\\w\\-]{0,61}[\\w])?(?:\\.[\\w](?:[\\w\\-]{0,61}[\\w])?)*)$ – Mau May 19 '17 at 20:31
-3

Gmail will only allow + sign as special character and in some cases (.) but any other special characters are not allowed at Gmail. RFC's says that you can use special characters but you should avoid sending mail to Gmail with special characters.