python3/email: parsing a list of email addresses with embedded commas?

Question

I know how to use email.utils.parseaddr() to parse an email address. However, I want to parse a list of multiple email addresses, such as the address portion of this header:

Cc: "abc" <foo@bar.com>, "www, xxyyzz" <something@else.com>

In general, I know I can split on a regex like \s*,\s* to get the individual addresses, but in my example, the name portion of one of the addresses contains a comma, and this regex therefore will split the header incorrectly.

I know how to manually write state-machine-based code to properly split that address into pieces, and I also know how to code a complicated regex that would match each email address. I'm not asking for help in writing such code. Rather, I'm wondering if there are any existing python modules which I can use to properly split this email address list, so I don't have to "re-invent the wheel".

Thank you in advance.

How does [email.utils.parseaddr](https://docs.python.org/3.6/library/email.util.html#email.utils.parseaddr) not accomplish that? Seems to work with the example you have: https://repl.it/Mi08 - Returns tuples of (name, email_addr), without splitting on the comma in the second name — chickity china chinese chicken, Oct 16 '17 at 17:44
In my version of python (version 3.6.0), `email.utils.parseaddr('"abc" , "www, xxyyzz" ')` only returns the first tuple, i.e., `('abc', 'foo@bar.com')` ... and the same is true when I tried this under python 2.7.9 — HippoMan, Oct 16 '17 at 17:48
Ah ... I see. In your example, you already split the header manually into a 2-element list before calling `email.utils.parseaddr` on each element. — HippoMan, Oct 16 '17 at 17:50
Oh I see, you're right, my bad, I didn't copy your example correctly, good catch ;) — chickity china chinese chicken, Oct 16 '17 at 17:52
Yes. The header value is a single string, as follows: `'"abc" , "www, xxyyzz" '`. I obtain it by invoking `msg.get('Cc')` on an email that was parsed via `email.parser.Parser().parsestr()`. — HippoMan, Oct 16 '17 at 18:12
Ok I think I found another answer that we can adopt to make work on your example — chickity china chinese chicken, Oct 16 '17 at 18:30

chickity china chinese chicken · Accepted Answer · 2017-10-16T18:40:01.860

11

Borrowing the answer from this question How do you extract multiple email addresses from an RFC 2822 mail header in python?

msg = 'Cc: "abc" <foo@bar.com>, "www, xxyyzz" <something@else.com>'

import email.utils

print(email.utils.getaddresses([msg]))

produces:

[('abc', 'foo@bar.com'), ('www, xxyyzz', 'something@else.com')]

edited Oct 16 '17 at 18:40

answered Oct 16 '17 at 18:32

chickity china chinese chicken

7,709
2
20
49

Just `email.utils.getaddresses([msg])` will suffice here – Jon Clements Oct 16 '17 at 18:36
Yes, this is the answer I'm looking for. Many thanks! – HippoMan Oct 16 '17 at 18:41
I am struggling with addresses which I copied from outlook and it doesn't have double-quote for name. Any suggestion here? – AndyC Jun 07 '22 at 02:47
If it doesn't have double-quote for the name, what does it have.. Do you have a sample use case for reference? – chickity china chinese chicken Jun 09 '22 at 20:46

score 0 · Answer 2 · answered Oct 16 '17 at 17:37

This is not elegant in the least and I'm sure someone will come along and improve upon this. However, this works for me and hopefully gives you an idea of how this can be done.

The split method is what you're looking for here I believe. In the simplest terms, you take your string and choose a character to split upon. This will separate the string into a list that you can iterate over assuming the split key selection is found. If it's not found then the string is a one element list.

emails = 'Cc: "abc" <foo@bar.com>, "www, xxyyzz" <something@else.com>'
emails
Out[37]: 
'Cc: "abc" <foo@bar.com>, "www, xxyyzz" <something@else.com>'
In [38]:
emails = emails.split(' ')
new_emails = []
for e in emails:
    if '@' in e:
        new_email = e.replace('<', '')
        new_email = new_email.replace('>', '')
        new_email = new_email.replace(',', '')
        new_emails.append(new_email)
print(new_emails)
['foo@bar.com', 'something@else.com']

If you want to use regex to do this, someone smarter than I will have to help.

Yes, thank you. I know how to do this via `split`, but I'm just hoping to find an existing package which will perform this in the general case. Also, I want to get the name portions of the addresses, as well, so that the result is this: `['"abc" ', '"www, xxyyzz" ']`. I know how to code this up, but again, I'm hoping for an existing package I can use to do this, if such a thing already exists. — HippoMan, Oct 16 '17 at 17:40

score 0 · Answer 3 · answered Oct 16 '17 at 18:37

I know I can do something like the following, but again, I'm hoping that there is already an existing package which could do this for me ...

#!/usr/bin/python3         

import email.utils

def getaddrs(text):
    def _yieldaddrs(text):
        inquote = False
        curaddr = ''
        for x in text:
            if x == '"':
                inquote = not inquote
                curaddr += x
            elif x == ',':
                if inquote:
                    curaddr += x
                else:
                    yield(curaddr)
                    curaddr = ''
            else:
                curaddr += x
        if curaddr:
            yield(curaddr)
    return [email.utils.parseaddr(x) for x in _yieldaddrs(text)]

addrstring = '"abc" <foo@bar.com>, "www, xxyyzz" <something@else.com>'
print('{}'.format(getaddrs(addrstring)))
# Prints this ...
#   [('abc', 'foo@bar.com'), ('www, xxyyzz', 'something@else.com')]

python3/email: parsing a list of email addresses with embedded commas?

3 Answers3