We have a database of over 200.000 e-mail addresses and associated contacts. I had an idea that if I could find out which e-mail addresses don't exist anymore I could inactivate those contacts, thus keeping a more up to date database. My main goal is not to validate if an e-mail exists. My main goal is to find as many non-exsitent e-mail addresses as possible.
I have based a lot of my research on these answers: How to check if an email address exists without sending an email?
I have tried the python validate_email
library, but it was very unreliable. It's also unsafe because you could get banned if you try to validate multiple contacts at the same company any time. It returned False to my active company e-mail, and None to my active gmail as well... so definitely unreliable.
I have tried both DNS with py3dns
and MX records. Also the VRFY
command. Unfortunately none of these seemed to be reliable since any e-mail server could send a fake response.
Greylisting is also a problem:
There is also an antispam technique called greylisting, which will cause the server to reject the address initially, expecting a real SMTP server would attempt a re-delivery some time later. This will mess up attempts to validate the address.
The idea also occured to my that I could send a dummy e-mail, or two because of greylisting with a bit of delay in between them. I am afraid that this could get me blacklisted after a while, especially if multiple of these contacts work at the same company. Another idea is to do this from randomly generated e-mail addresses and hosts but that is probably not possible. Is there any way I could determine if an e-mail does not exist, preferably in a way that the chance of getting banned is minimal?