RFCs 5321, 5322 and 6531 have complex rules for validating email addresses. They:
- allow creating comments inside an email address
- offer complicated restriction rules for symbols:
"() ,:;<>@[\]
- treat
postmaster
localpart as case-insensitive but all other as case-sensitive - allow groups of email addresses
Thanks to these complicated rules, testing whether a given string is a syntactically valid email address according to the RFCs can't be performed using only Regular Expressions.
Apparently, many of these rules aren't supported by major email providers.
Historically speaking, what were the motivations for creating so complex rules for email addresses? The Wikipedia article on the origins of email would seem to imply that the modern standard from the early 1980s intended to cover all legacy email-ish systems with their particular standards and syntaxes.
However, implementors of standards, email providers and email end-users alike all have a vested interest in a working system, which is easier to achieve when rules are not too arcane and can be easily cast into software that passes a finite number of tests, so why do we today have a standard that is so complicated nobody uses it to the full extent?
Again historically speaking, XML has largely been superseded by JSON, the success of which can partly be ascribed to the simplicity of its grammar.