1

Background:

I am writing a little script which requires, as one of it's arguments, an email address list in a file. The script will them go on to use the email address over a telnet connection to an SMTP server, so they need to be syntactically valid; consequently I have put a function to check the email address validity (incidentally, this regex may not be perfect, but is not the focus of the question, please bear with me. Will probably be loosened up):

def checkmailsyntax(email):
    match = re.match('^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$', email)

    if match == None:
        return True

The main() program goes on to read the input filename as an argument (in argparse) and insert it into a (currently global) list:

with open(args.targetfile) as targets:
    target_email_list = targets.readlines()

I figured it would be great for the script to automatically delete an email address from the list (rather than just telling you it was wrong which is what it used to do) if the checkmailsyntax function failed. This cleaned list could then go on to submit syntactically valid email addresses to the SMTP server:

for i in target_email_list:
    if checkmailsyntax(i):
        target_email_list.remove(i)

Error checking code that I have put in both before and after the delete element snippet to see if it's doing it's job:

for i in target_email_list:
    print i

The issue: The output of the code is thus:

Before delete element snippet (and the entire contents of the file submitted):

me@example.com  
you@example.com  
them@example.com  
noemail.com  
incorrectemail.com  
new@example.com  
pretendemail.com  
wrongemail.com  
right@example.com  
badlywrong.com  
whollycorrect@example.com  

After delete element snippet:

me@example.com  
you@example.com  
them@example.com  
incorrectemail.com  
new@example.com  
wrongemail.com  
right@example.com  
whollycorrect@example.com  

So I'm pretty stumped as to why 'noemail.com', 'pretendemail.com' and 'badlywrong.com' were removed and yet 'incorrectemail.com' and 'wrongemail.com' are not. It seems to occur when there are two syntactically incorrect emails in the file sequentially.

Can anyone point me in the right direction?

AKS
  • 18,983
  • 3
  • 43
  • 54
FiddleDeDee
  • 183
  • 1
  • 8

2 Answers2

3

It is because you are removing elements from the list while iterating over it:

for i in target_email_list:
    if checkmailsyntax(i):
        target_email_list.remove(i) # here

Since, following values are together:

pretendemail.com  
wrongemail.com

Once you remove pretendemail.com email, the next one wrongemail.com shifts up and the iterator thinks that this has been iterated. So the item which comes next is right@example.com and wrongemail.com is never checked for valid syntax. You can just add print(i) before checking the syntax and see for yourself.

You can use list comprehension for this purpose:

valid_emails = [email for email in target_email_list if checkmailsyntax(email)]
AKS
  • 18,983
  • 3
  • 43
  • 54
  • 1
    As simple as that. In the current code, valid_emails gives me the all the invalid ones; I have sorted that by removing the '==None' from the checkmailsyntax function, so that it returns True if there is a match found. Thank you! – FiddleDeDee May 24 '16 at 16:54
0

AKS's answer has you covered: don't remove from the list that you are iterating over! For a quick-fix, you can remove from the actual list while iterating over a copy:

for i in target_email_list[:]:  # iterates over the slice
    if checkmailsyntax(i):
        target_email_list.remove(i)  # removes from actual list
user2390182
  • 72,016
  • 6
  • 67
  • 89