I have a very long list of emails that I would like to process to:
- separate good emails from bad emails, and
- remove duplicates but keep all the non-duplicates in the same order.
This is what I have so far:
email_list = ["joe@example.com", "invalid_email", ...]
email_set = set()
bad_emails = []
good_emails = []
dups = False
for email in email_list:
if email in email_set:
dups = True
continue
email_set.add(email)
if email_re.match(email):
good_emails.append(email)
else:
bad_emails.append(email)
I would like this chunk of code to be as fast as possible, and of less importance, to minimize memory requirements. Is there a way to improve this in Python? Maybe using list comprehensions or iterators?
EDIT: Sorry! Forget to mention that this is Python 2.5 since this is for GAE.
email_re is from django.core.validators