I am trying to parse an RFC 5322 compliant "From: " field in an e-mail message into two parts: the display-name, and the e-mail address, in Python 2.7 (the display-name could be empty). The familiar example is something like
John Smith <jsmith@example.org>
In above, John Smith is the display-name and jsmith@example.org is the email address. But the following is also a valid "From: " field:
"unusual" <"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com>
In this example, the return value for display-name is
"unusual"
and
"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com
is the email address.
You can use grammars to parse this in Perl (as explained in these questions: Using a regular expression to validate an email address and The recognizing power of “modern” regexes), but I'd like to do this in Python 2.7. I have tried using email.parser module in Python, but that module seems only to be able to separate those fields that are distinguished by a colon. So, if you do something like
from email.parser import Parser
headers = Parser().parsestr('From: "John Smith" <jsmith@example.org>')
print headers['from']
it will return
"John Smith" <jsmith@example.com>
while if you replace the last line in the above code with
print headers['display-name']
it will return
None
I'll very much appreciate any suggestions and comments.