9

RFCs 5321, 5322 and 6531 have complex rules for validating email addresses. They:

  • allow creating comments inside an email address
  • offer complicated restriction rules for symbols: "() ,:;<>@[\]
  • treat postmaster localpart as case-insensitive but all other as case-sensitive
  • allow groups of email addresses

Thanks to these complicated rules, testing whether a given string is a syntactically valid email address according to the RFCs can't be performed using only Regular Expressions.

Apparently, many of these rules aren't supported by major email providers.

Historically speaking, what were the motivations for creating so complex rules for email addresses? The Wikipedia article on the origins of email would seem to imply that the modern standard from the early 1980s intended to cover all legacy email-ish systems with their particular standards and syntaxes.

However, implementors of standards, email providers and email end-users alike all have a vested interest in a working system, which is easier to achieve when rules are not too arcane and can be easily cast into software that passes a finite number of tests, so why do we today have a standard that is so complicated nobody uses it to the full extent?

Again historically speaking, XML has largely been superseded by JSON, the success of which can partly be ascribed to the simplicity of its grammar.

Community
  • 1
  • 1
Andrei Botalov
  • 20,686
  • 11
  • 89
  • 123
  • Because RFCs are supposed to cover all edge cases, not just "major providers"? – ceejayoz Mar 30 '12 at 21:20
  • 3
    i believe this is a sensible and valid question that should best be discussed with a historical hindsight. the RFCs named above are not the only culprits; the RFC governing the valid ways to transcribe IP addresses are likewise convoluted. interestingly, in the real world, only tiny portions of these RFCs are actively used—virtually all email addresses look like `name@exxample.com`, and all IP addresses like `123.234.231.132`. had this question not been prematurely closed, we could now enjoy a lively discussion on the merits and demerits and historical backgrounds of these highly complex RFCs. – flow Apr 04 '14 at 11:46
  • While this is an interesting and valid (in the wider software community sense) question, it is not right for StackOverflow. – JasonMArcher Apr 04 '14 at 16:54
  • 1
    i object: this is a site to discuss questions arising about programming. sometimes it's good to know some history—why the things are the way they are. it is also a very practical question, viz.: while email addresses have a *theoretically* incredibly convoluted syntax, in *practice* this is often shortcut to a much simpler version of the same. here is the place to discuss that. – flow Apr 05 '14 at 10:36
  • Stackoverflow is particularly bad at handling these questions. Perhaps ask on a relevant mailing list. – Marcin Sep 10 '14 at 21:28
  • It recently came out that the netmask npm package didn't handle octal IPv4 address notation properly for over 10 years, and this is a vulnerability. I had completely forgotten IPv4 addresses can be written in octal because everyone uses decimal. I wonder why someone thought that unnecessary complexity was a good idea either... – Andy Mar 30 '21 at 14:42

1 Answers1

1

The only sure way to see if a supplied email address is genuine is to send an email to it and see if the user recieves it. The one useful check that can be performed on an address is to check that the email address is syntactically valid. That is what this module does.

Systems that send mail must be capable of handling outgoing mail for all valid addresses. Contrary to the relevant standards, some defective systems treat certain legitimate addresses as invalid and fail to handle mail to these addresses. Hotmail, for example, refuses to send mail to any address containing any of the following standards-permissible characters: !#$%*/?^`{|}~

Just different levels of standards, where some are very strict, therefore complex.

Sully
  • 14,672
  • 5
  • 54
  • 79
  • while your answer is technically correct, it does not answer the question (which is about the syntax of email addresses). knowing that a street address is or is not syntactically correct is one thing, knowing whether there is a building on that spot or not is something else. – flow Apr 05 '14 at 11:27