2

Right now I'm "removing" emails from a list by mapping a new list excluding the things I don't want. This looked like:

    pattern = re.compile('b\.com')

    emails = ['user@a.com', 'user@b.com', 'user@c.com', 'user@d.com']
    emails = [e for e in emails if pattern.search(e) == None]
    # resulting list:  ['user@a.com', 'user@c.com']

However, now I need to filter out multiple domains, so I have a list of domains that need to be filtered out.

    pattern_list = ['b.com', 'c.com']

Is there a way to do this still in list comprehension form or am I going to have to revert back to nested for loops?

Note: splitting the string at the @ and doing word[1] in pattern_list won't work because c.com needs to catch sub.c.com as well.

diplosaurus
  • 2,538
  • 5
  • 25
  • 53
  • I dont like list comprehension is the best way to address this - You might be able to do it, but a lot cumbersome. Look at this solution: http://stackoverflow.com/questions/19150208/python-search-regex-from-variable-inside-a-list – karthikr Sep 23 '14 at 18:27
  • Note that your existing example will also exclude, for instance `user@crumb.com` and `bob.com@bob.com`. Is that what you want? – BrenBarn Sep 23 '14 at 18:28
  • When you're making list comprehensions of list comprehensions, it's often better to use generators (change the square brackets to parens), which are more memory efficient and chain together nicely. – Seth Sep 23 '14 at 18:30
  • Also, in your regex `.` is a special character, so it will also exclude `bob@bocom`, since `b.com` matches `bocom`. Is that also what you want? – BrenBarn Sep 23 '14 at 18:31
  • @BrenBarn That's a good point, although in my actual code the domains are rather long and unique and I do want to move anything containing them. – diplosaurus Sep 23 '14 at 18:32
  • 1
    Using `is None` reads better than `== None`. Also it's a bit more efficient since `is` cannot be overloaded and the interpreter can just do a pointer comparison. – Bakuriu Sep 23 '14 at 18:48

2 Answers2

2
import re

pattern = re.compile('b.com$|c.com$')

emails = ['user@a.com', 'user@b.com', 'user@c.com', 'user@d.com']

emails = [e for e in emails if pattern.search(e) == None]

print emails

what about this

abhishekgarg
  • 1,480
  • 9
  • 14
2

There are a few ways to do this, even without using a regex. One is:

[e for e in emails if not any(pat in e for pat in pattern_list)]

This will also exclude emails like user@crumb.com and bob.com@bob.com, but so does your original solution. It does not, however, exclude cases like user@bocom, which your existing solution does. Again, it's not clear if your existing solution actually does what you think it does.

Another possibility is to combine your patterns into one with rx = '|'.join(pattern_list) and then match on that regex. Again, though, you'll need to use a more complex regex if you want to only match b.com as a full domain (not as just part of the domain or as part of the username).

BrenBarn
  • 242,874
  • 37
  • 412
  • 384