Intention
I'm trying to do some minimal very minimal validation of e-mail addresses, despite seeing a lot of advice advising against doing that. The reason I'm doing this is that spec I am implementing requires e-mail addresses to be in this format:
mailto:<uri-encoded local part>@<domain part>
I'd like to simply split on the starting mailto:
and the final @
, and assume the "local part" is between these. I'll verify that the "local part" is URI encoded.
I don't want to do much more than this, and the spec allows for me to get away with "best effort" validation for most of this, but is very specific on the URI encoding and the mailto:
prefix.
Problem
From everything I've read, splitting on the @
seems risky to me.
I've seen a lot of conflicting advice on the web and on Stack Overflow answers, most of it saying "read the RFCs", and some of it saying that the domain part can only be certain characters, i.e. 1-9
a-z
A-Z
-.
, maybe a couple other characters, but not much more than this. E.g.:
When I read various RFCs on domain names, I see that "any CHAR" (dtext
) or "any character between ASCII 33 and 90" (dtext
) are allowed, which implies @
symbols are allowed. This is further compounded because "comments" are allowed in parens (
)
and can contain characters between ASCII 42 and 91 which include @
.
RFC1035 seems to support the letters+digits+dashes+periods requirement, but "domain literal" syntax in RFC5322 seems to allow more characters.
Am I misunderstanding the RFC, or is there something I'm missing that disallows a @
in the domain part of an e-mail address? Is "domain literal" syntax something I don't have to worry about?