Not a duplicate: This answer collects known solutions to validate an email address. It also contains information about known limitations when checking international emails. In the end i provide a possible solution how to encounter international emails.
filter_var
The author of this post, proposed the following function to validate an email:
function isValidEmail($email){
return filter_var($email, FILTER_VALIDATE_EMAIL) !== false;
}
If you require a TLD to be part of the address, the author also proposed:
function isValidEmail($email) {
return filter_var($email, FILTER_VALIDATE_EMAIL)
&& preg_match('/@.+\./', $email);
}
Problem: No support for international email addresses
filter_var
does not cover international email addresses, which contain UTF-8 characters such as Greek or Russian.
preg_match
Use custom regex to validate the structure. Good post with detailed description is here.
The author proposed a regex from http://emailregex.com/, which allows to check against the latest RDF 5322. The following code is the non-fixed version:
$regex = '/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}@)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/iD';
if (1 == \preg_match($regex, $email)) {
// email OK
}
He also mentioned:
[...] RFC 5322 leads to a regex that can be understood if studied for a few minutes and is efficient enough for actual use. [...]
Problem: No support for international email addresses
This solution also not covers international addresses, which lead to no match.
Optional: DNS lookup
DNS lookup is not a validation, but could complement the check. It works with all UTF-8 characters, if they form a valid internationalized domain name (Reference: https://en.wikipedia.org/wiki/Internationalized_domain_name).
[...] is an Internet domain name that contains at least one label that is displayed in software applications, [...], in a language-specific script or alphabet, such as Arabic, Chinese, Cyrillic, Tamil, Hebrew or the Latin alphabet-based characters with diacritics or ligatures, such as French.
Via checkdnsrr
you check if a given domain has a valid DNS record.
// $domain was extracted from the given email before
// $domain must end with a . (see comment below)
if (checkdnsrr($domain, 'MX') || checkdnsrr($domain, 'A') || checkdnsrr($domain, 'AAAA')) {
// domain is VALID
}
User Martin mentioned at php.net, that the domain must end with a .
to be considered valid. Without the point, you will get false positives.
Source: http://php.net/manual/en/function.checkdnsrr.php#119969
Handle international emails
Possible solution 1: structural check + DNS look up
What I have seen so far, you need a combination of structural checks + DNS look up to get the best coverage. The first part of the following code is based on the class EmailAddress
from Genkgo Mail ( source ).
function mail_is_valid(string $address): bool {
$hits = \preg_match('/^([^@]+)@([^@]+)$/', $address, $matches);
if ($hits === 0) {
// email NOT valid
return false;
}
[$address, $localPart, $domain] = $matches;
$variant = INTL_IDNA_VARIANT_2003;
if (\defined('INTL_IDNA_VARIANT_UTS46') ) {
$variant = INTL_IDNA_VARIANT_UTS46;
}
$domain = \rtrim(\idn_to_ascii($domain, IDNA_DEFAULT, $variant), '.') . '.';
if (!\checkdnsrr($domain, 'MX')) {
return \checkdnsrr($domain, 'A') || \checkdnsrr($domain, 'AAAA');
} else {
return true;
}
}
I consider it the currently best solution, because the algorithm is mostly character agnostic, which allows UTF-8 characters in the email. That is valid, as long as you have a user-part + @
+ domain-part. The DNS lookup ensures the domain exists.
Its not optimal. If you know a better way, please post it as comment or solution.