0

I have a string, let's say an email From field:

str1 = "Name <emailaddress@example.com>"

(or perhaps with another format, the thing is that inside of str an email address is found...)

And I have a list of addresses:

lst = ["email1@example.com", "email2@yahoo.com", "email3@mail.com", "emailaddress@example.com"]

What is the most pythonic way to search if the part of str with the email address is one of the members on lst ?

In the example, the email part of str1 is part of lst, but for:

str2 = "Another email emailexample@domain.com"

it is not...

Also,

str3 = "Example email1@example.com"

would match because email1@example.com is in the list, no matter there's no '<' '>' surrounding the email addres...

Javier Novoa C.
  • 11,257
  • 13
  • 57
  • 75

3 Answers3

2

from http://love-python.blogspot.com/2008/04/python-code-to-scrape-email-address.html

>>> email_pattern = re.compile("[-a-zA-Z0-9._]+@[-a-zA-Z0-9_]+.[a-zA-Z0-9_.]+")
>>> str = "Name <emailaddress@example.com>"
>>> str2 = "Another email emailexample@domain.com"
>>> lst = ["email1@example.com", "email2@yahoo.com", "email3@mail.com", "emailaddress@example.com"]
>>> import re
>>> set(re.findall(email_pattern, str)).intersection(lst)
set(['emailaddress@example.com'])
>>> set(re.findall(email_pattern, str2)).intersection(lst)
set([])
Marco Mariani
  • 13,556
  • 6
  • 39
  • 55
  • well, before your edition, I thought that was what I need... I mean, without the '<%s>' % l, but just l ... As you noticed, "Name emailaddress@example.com" won't match, which I need. But why is this a problem with "verylongemail2@yahoo.com"? – Javier Novoa C. Mar 14 '12 at 23:46
  • 2
    "verylongemail2@yahoo.com" contains "email2@yahoo.com", but they are distinct addresses and should not match. – Marco Mariani Mar 14 '12 at 23:49
  • wait, I edited my post, seems that I didn't explained it better. the '<>' surrounding the email address shouldn't be necessary in my case... – Javier Novoa C. Mar 14 '12 at 23:52
  • 1
    how about this version? keep in mind that perfectly matching email addresses with regexp can get hideously complex. – Marco Mariani Mar 14 '12 at 23:56
  • 1
    "hideously complex" indeed. See http://ex-parrot.com/~pdw/Mail-RFC822-Address.html for example, and note that the spec allows things that have to be preprocessed and otherwise can't be parsed by a proper regex at all. – Karl Knechtel Mar 15 '12 at 00:30
2

Usually regex are not considered pythonic, but this seems a task made exactly for them.

So I would use them, extract the email adress and check if it's in the list:

>>> re.search(r'<(.*)>', "Name <emailaddress@example.com>").group(1) in lst
True

"pythonic" isn't a word to throw there that will solve any problem, one should consider all the available options and choose the best one.

Edit: If the format of your field isn't standard, no problem: you just need a better regex that will match the email. (I'm sure there are a ton of examples out there, I'm not going to google it for you).

But that doesn't mean that you shouldn't use regex for this kind of task.

Rik Poggi
  • 28,332
  • 6
  • 65
  • 82
  • @JavierNovoaC.: It doesn't matter if the email adress is not surrounded by `<>`, I just show you an simple example with a basic regex. You can use a different regex to extract the email adress. I really don't see the problem. – Rik Poggi Mar 15 '12 at 00:01
  • thanks, I'll search for it. I was looking for a pythonic solution, thinking on the context of my problem, but that thing you mention is a good advice... – Javier Novoa C. Mar 15 '12 at 00:02
1

I don't know if this is pythonic:

return str1.split('<')[1].split('>')[0] in lst
lllluuukke
  • 1,304
  • 2
  • 13
  • 17