16

I have a large number of email addresses to validate. Initially I parse them with a regexp to throw out the completely crazy ones. I'm left with the ones that look sensible but still might contain errors.

I want to find which addresses have valid domains, so given me@abcxyz.com I want to know if it's even possible to send emails to abcxyz.com .

I want to test that to see if it corresponds to a valid A or MX record - is there an easy way to do it using only Python standard library? I'd rather not add an additional dependency to my project just to support this feature.

Salim Fadhley
  • 22,020
  • 23
  • 75
  • 102
  • strictly speaking, a domain can receive mail even without an MX record. rfc2811, section 5 describes a fallback to A records. i'd consider playing through parts of an smtp session (up until RCPT TO:) –  Feb 18 '09 at 02:11
  • @hop: i didn't know about the fallback, thanks for reference. I think you meant RFC 2821? – Van Gale Feb 18 '09 at 07:25
  • anyway, after thinking about it, i came to the conclusion that it would be better not to do the preemptive verification at all and just the newsletter (i assume that's the reason for having the list) and collect the bounces afterwards. –  Feb 18 '09 at 11:33
  • 1
    RFC 2821 is obsolete, see RFC 5321. The fallback is now not only to A records but to AAAA records as well. – bortzmeyer Feb 25 '09 at 08:37
  • Parsing email addresses with regexps is unrealistic and leads to many false positives. See http://stackoverflow.com/questions/201323/201378 – bortzmeyer Feb 25 '09 at 08:39

3 Answers3

16

There is no DNS interface in the standard library so you will either have to roll your own or use a third party library.

This is not a fast-changing concept though, so the external libraries are stable and well tested.

The one I've used successful for the same task as your question is PyDNS.

A very rough sketch of my code is something like this:

import DNS, smtplib

DNS.DiscoverNameServers()
mx_hosts = DNS.mxlookup(hostname)

# Just doing the mxlookup might be enough for you,
# but do something like this to test for SMTP server
for mx in mx_hosts:
    smtp = smtplib.SMTP()
    #.. if this doesn't raise an exception it is a valid MX host...
    try:
        smtp.connect(mx[1])
    except smtplib.SMTPConnectError:
        continue # try the next MX server in list

Another library that might be better/faster than PyDNS is dnsmodule although it looks like it hasn't had any activity since 2002, compared to PyDNS last update in August 2008.

Edit: I would also like to point out that email addresses can't be easily parsed with a regexp. You are better off using the parseaddr() function in the standard library email.utils module (see my answer to this question for example).

Community
  • 1
  • 1
Van Gale
  • 43,536
  • 9
  • 71
  • 81
3

The easy way to do this NOT in the standard library is to use the validate_email package:

from validate_email import validate_email
is_valid = validate_email('example@example.com', check_mx=True)

For faster results to process a large number of email addresses (e.g. list emails, you could stash the domains and only do a check_mx if the domain isn't there. Something like:

emails = ["email@example.com", "email@bad_domain", "email2@example.com", ...]
verified_domains = set()
for email in emails:
    domain = email.split("@")[-1]
    domain_verified = domain in verified_domains
    is_valid = validate_email(email, check_mx=not domain_verified)
    if is_valid:
        verified_domains.add(domain)
Mark Chackerian
  • 21,866
  • 6
  • 108
  • 99
  • 1
    This package works great if it is indeed an invalid email. If its a real one it takes forever to respond. – blissweb May 28 '20 at 12:16
0

An easy and effective way is to use a python package named as validate_email. This package provides both the facilities. Check this article which will help you to check if your email actually exists or not.

AlixaProDev
  • 472
  • 1
  • 5
  • 13