TL;DR
Any custom regexp you'll find on internet, including URI::MailTo::EMAIL_REGEXP
, is wrong.
Here what you should use:
# The closest thing to RFC_5322
RFC_5322 = /\A(?:[a-z0-9!#$%&'*+\/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_‘{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])\z/i
# Lighter more practical version RFC_5322 that will be more useful in real life
RFC_5322_light = /\A[a-z0-9!#$%&'*+\/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_‘{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\z/i
# Same as the light version but with length limit enforcing
RFC_5322_with_length = /\A(?=[a-z0-9@.!#$%&'*+\/=?^_‘{|}~-]{6,254}\z)(?=[a-z0-9.!#$%&'*+\/=?^_‘{|}~-]{1,64}@)[a-z0-9!#$%&'*+\/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_‘{|}~-]+)*@(?:(?=[a-z0-9-]{1,63}\.)[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?=[a-z0-9-]{1,63}\z)[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\z/i
Details
The last RFC defining email address format is RFC5322 - Internet Message Format.
You can check the section 3.4.1. Addr-Spec Specification. If we only look at the first part, the @
split the local part (on the left) and the domain (on the right).
addr-spec = local-part "@" domain
local-part = dot-atom / quoted-string / obs-local-part
For example, the local part, can contain a dot-atom or a quoted-string defined here:
It's a bit complex but your email address can contain many ASCII special character that are excluded of many regexp (like #
, $
, &
, etc.).
On the other hand, URI::MailTo::EMAIL_REGEXP
is defined in ruby/lib/uri/mailto.rb
with the following regexp:
EMAIL_REGEXP = /\A[a-zA-Z0-9.!\#$%&'*+\/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*\z/
The comment above this regexp suggest they followed the recommendations at https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address.
A valid email address is a string that matches the email production of the following ABNF, the character set for which is Unicode. This ABNF implements the extensions described in RFC 1123. [ABNF] [RFC5322] [RFC1034] [RFC1123]
But WHATWG spec add the following comment, which is very important:
This requirement is a willful violation of RFC 5322, which defines a syntax for email addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.
So WHATWG is telling us they didn't respect the RFC that was standardizing the email address format. They say the domain part is too vague in RFC 5322 but RFC 5322 gives this note to tell use we have to check other RFCs for a more complete domain format spec:
Note: A liberal syntax for the domain portion of addr-spec is
given here. However, the domain portion contains addressing
information specified by and used in other protocols (e.g.,
[RFC1034], [RFC1035], [RFC1123], [RFC5321]). It is therefore
incumbent upon implementations to conform to the syntax of
addresses for the context in which they are used.
WHATWG also tells us that the local-part in RFC 5322 is too strict. But look at URI::MailTo::EMAIL_REGEXP
that follows WHATWG spec instead:
URI::MailTo::EMAIL_REGEXP.match?('.@toto.fr') # => true
URI::MailTo::EMAIL_REGEXP.match?('-@z') # => true
URI::MailTo::EMAIL_REGEXP.match?('++++++++.........@z') # => true
In the contrary WHATWG spec (and so URI::MailTo::EMAIL_REGEXP
) is way too lax.
So I found at https://emailregex.com/ a General Email Regex (RFC 5322 Official Standard) (see summary).
The explanation and alternatives can be found at https://www.regular-expressions.info/email.html.
# Blind RFC 5322
\A(?:[a-z0-9!#$%&'*+/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_‘{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])\z
# RFC 5322, practical version (omit IP addresses, domain-specific addresses, the syntax using double quotes and square brackets)
\A[a-z0-9!#$%&'*+/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_‘{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\z
# RFC 5322, practical version (similar as previous + length limits enfocing)
\A(?=[a-z0-9@.!#$%&'*+/=?^_‘{|}~-]{6,254}\z)(?=[a-z0-9.!#$%&'*+/=?^_‘{|}~-]{1,64}@)[a-z0-9!#$%&'*+/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_‘{|}~-]+)*@(?:(?=[a-z0-9-]{1,63}\.)[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?=[a-z0-9-]{1,63}\z)[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\z
And as you can see on the screenshot below none of addresses accepted by WHATWG / URI::MailTo::EMAIL_REGEXP
is valid.

Let's do the same thing locally:
RFC_5322 = /\A(?:[a-z0-9!#$%&'*+/=?^_‘{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_‘{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])\z/i
Now we can compare both (on Ruby 3.2.0):
# WHATWG
## Invalid cases
URI::MailTo::EMAIL_REGEXP.match?('.@toto.fr') # => true
URI::MailTo::EMAIL_REGEXP.match?('-@z') # => true
URI::MailTo::EMAIL_REGEXP.match?('++++++++.........@z') # => true
URI::MailTo::EMAIL_REGEXP.match?('invalíd@mail.com') # => false
URI::MailTo::EMAIL_REGEXP.match?('invalid%$£"@domain.com') # => false
URI::MailTo::EMAIL_REGEXP.match?('invalid£@domain.com') # => false
URI::MailTo::EMAIL_REGEXP.match?('invali"d@domain.com') # => false
URI::MailTo::EMAIL_REGEXP.match?('.dot..dot.@example.org') # => true
URI::MailTo::EMAIL_REGEXP.match?('!#$%’*+-/=?^_`{|}~@example.org') # => false
## Valid cases
URI::MailTo::EMAIL_REGEXP.match?('sometest@gmail.com') # => true
URI::MailTo::EMAIL_REGEXP.match?('some+test@gmail.com') # => true
URI::MailTo::EMAIL_REGEXP.match?('stuart.sillitoe@prodirectsport.net') # => true
URI::MailTo::EMAIL_REGEXP.match?('_valid@mail.com') # => true
URI::MailTo::EMAIL_REGEXP.match?('valid%$@domain.com') # => true
URI::MailTo::EMAIL_REGEXP.match?('"valid"@domain.com') # crash with error NameError
# RFC 5322
## Invalid cases
RFC_5322.match?('.@toto.fr') # => false
RFC_5322.match?('-@z') # => false
RFC_5322.match?('++++++++.........@z') # => false
RFC_5322.match?('invalíd@mail.com') # => false
RFC_5322.match?('invalid%$£"@domain.com') # => false
RFC_5322.match?('invalid£@domain.com') # => false
RFC_5322.match?('invali"d@domain.com') # => false
RFC_5322.match?('.dot..dot.@example.org') # => false
RFC_5322.match?('!#$%’*+-/=?^_`{|}~@example.org') # => false
## Valid cases
RFC_5322.match?('sometest@gmail.com') # => true
RFC_5322.match?('some+test@gmail.com') # => true
RFC_5322.match?('stuart.sillitoe@prodirectsport.net') # => true
RFC_5322.match?('_valid@mail.com') # => true
RFC_5322.match?('valid%$@domain.com') # => true
RFC_5322.match?('"valid"@domain.com') # => true
# RFC 5322 light (same results with RFC_5322_with_length)
## Invalid cases
RFC_5322_light.match?('.@toto.fr') # => false
RFC_5322_light.match?('-@z') # => false
RFC_5322_light.match?('++++++++.........@z') # => false
RFC_5322_light.match?('invalíd@mail.com') # => false
RFC_5322_light.match?('invalid%$£"@domain.com') # => false
RFC_5322_light.match?('invalid£@domain.com') # => false
RFC_5322_light.match?('invali"d@domain.com') # => false
RFC_5322_light.match?('.dot..dot.@example.org') # => false
RFC_5322_light.match?('!#$%’*+-/=?^_`{|}~@example.org') # => false
## Valid cases
RFC_5322_light.match?('sometest@gmail.com') # => true
RFC_5322_light.match?('some+test@gmail.com') # => true
RFC_5322_light.match?('stuart.sillitoe@prodirectsport.net') # => true
RFC_5322_light.match?('_valid@mail.com') # => true
RFC_5322_light.match?('valid%$@domain.com') # => true
RFC_5322_light.match?('"valid"@domain.com') # => false (difference with "pure" version)
Warning this test is no complete and does not cover all cases.