1

I know how to use email.utils.parseaddr() to parse an email address. However, I want to parse a list of multiple email addresses, such as the address portion of this header:

Cc: "abc" <foo@bar.com>, "www, xxyyzz" <something@else.com>

In general, I know I can split on a regex like \s*,\s* to get the individual addresses, but in my example, the name portion of one of the addresses contains a comma, and this regex therefore will split the header incorrectly.

I know how to manually write state-machine-based code to properly split that address into pieces, and I also know how to code a complicated regex that would match each email address. I'm not asking for help in writing such code. Rather, I'm wondering if there are any existing python modules which I can use to properly split this email address list, so I don't have to "re-invent the wheel".

Thank you in advance.

HippoMan
  • 2,119
  • 2
  • 25
  • 48

3 Answers3

11

Borrowing the answer from this question How do you extract multiple email addresses from an RFC 2822 mail header in python?

msg = 'Cc: "abc" <foo@bar.com>, "www, xxyyzz" <something@else.com>'

import email.utils

print(email.utils.getaddresses([msg]))

produces:

[('abc', 'foo@bar.com'), ('www, xxyyzz', 'something@else.com')]
0

This is not elegant in the least and I'm sure someone will come along and improve upon this. However, this works for me and hopefully gives you an idea of how this can be done.

The split method is what you're looking for here I believe. In the simplest terms, you take your string and choose a character to split upon. This will separate the string into a list that you can iterate over assuming the split key selection is found. If it's not found then the string is a one element list.

emails = 'Cc: "abc" <foo@bar.com>, "www, xxyyzz" <something@else.com>'
emails
Out[37]: 
'Cc: "abc" <foo@bar.com>, "www, xxyyzz" <something@else.com>'
In [38]:
emails = emails.split(' ')
new_emails = []
for e in emails:
    if '@' in e:
        new_email = e.replace('<', '')
        new_email = new_email.replace('>', '')
        new_email = new_email.replace(',', '')
        new_emails.append(new_email)
print(new_emails)
['foo@bar.com', 'something@else.com']

If you want to use regex to do this, someone smarter than I will have to help.

mnickey
  • 727
  • 1
  • 6
  • 15
  • 1
    Yes, thank you. I know how to do this via `split`, but I'm just hoping to find an existing package which will perform this in the general case. Also, I want to get the name portions of the addresses, as well, so that the result is this: `['"abc" ', '"www, xxyyzz" ']`. I know how to code this up, but again, I'm hoping for an existing package I can use to do this, if such a thing already exists. – HippoMan Oct 16 '17 at 17:40
0

I know I can do something like the following, but again, I'm hoping that there is already an existing package which could do this for me ...

#!/usr/bin/python3         

import email.utils

def getaddrs(text):
    def _yieldaddrs(text):
        inquote = False
        curaddr = ''
        for x in text:
            if x == '"':
                inquote = not inquote
                curaddr += x
            elif x == ',':
                if inquote:
                    curaddr += x
                else:
                    yield(curaddr)
                    curaddr = ''
            else:
                curaddr += x
        if curaddr:
            yield(curaddr)
    return [email.utils.parseaddr(x) for x in _yieldaddrs(text)]

addrstring = '"abc" <foo@bar.com>, "www, xxyyzz" <something@else.com>'
print('{}'.format(getaddrs(addrstring)))
# Prints this ...
#   [('abc', 'foo@bar.com'), ('www, xxyyzz', 'something@else.com')]
HippoMan
  • 2,119
  • 2
  • 25
  • 48