4

The raw email usually looks something like this

From root@a1.local.tld Thu Jul 25 19:28:59 2013
Received: from a1.local.tld (localhost [127.0.0.1])
    by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866
    for <ooo@a1.local.tld>; Thu, 25 Jul 2013 19:28:59 -0700
Received: (from root@localhost)
    by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865;
    Thu, 25 Jul 2013 19:28:59 -0700
From: root@a1.local.tld
Subject: ooooooooooooooooooooooo
To: ooo@a1.local.tld
Cc: 
X-Originating-IP: 192.168.15.127
X-Mailer: Webmin 1.420
Message-Id: <1374805739.3861@a1>
Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="bound1374805739"

This is a multi-part message in MIME format.

--bound1374805739
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo

--bound1374805739--

So if I wanted to code a PYTHON script to get the

From
To
Subject
Body

Is this the code I am looking for to built on of or is there a better method?

a='<title>aaa</title><title>aaa2</title><title>aaa3</title>'

import re
a1 = re.findall(r'<(title)>(.*?)<(/title)>', a)
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
  • Ever heard of PLY or, most particularly, PyParsing? If you'll be doing lots of emails that might contain characters that wouls break a handmade parser, the two are great Python packages designed for parsing files. You might want to try PyParsing first; it's the easiest. – kirbyfan64sos Jul 26 '13 at 03:33

5 Answers5

19

I don't really understand what your final code snippet has to do with anything - you haven't mentioned anything about HTML until that point, so I don't know why you would suddenly be giving an example of parsing HTML (which you should never do with a regex anyway).

In any case, to answer your original question about getting the headers from an email message, Python includes code to do that in the standard library:

import email
msg = email.message_from_string(email_string)
msg['from']  # 'root@a1.local.tld'
msg['to']    # 'ooo@a1.local.tld'
Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
14

Fortunately Python makes this simpler: http://docs.python.org/2.7/library/email.parser.html#email.parser.Parser

from email.parser import Parser
parser = Parser()

emailText = """PUT THE RAW TEXT OF YOUR EMAIL HERE"""
email = parser.parsestr(emailText)

print email.get('From')
print email.get('To')
print email.get('Subject')

The body is trickier. Call email.is_multipart(). If that's false, you can get your body by calling email.get_payload(). However, if it's true, email.get_payload() will return a list of messages, so you'll have to call get_payload() on each of those.

if email.is_multipart():
    for part in email.get_payload():
        print part.get_payload()
else:
    print email.get_payload()
Kiwi
  • 1,083
  • 11
  • 26
2

"Body" is not present in your sample email

Can use email module :

import email
    msg = email.message_from_string(email_message_as_text)

Then use:

print email['To']
print email['From']

... ... etc

P0W
  • 46,614
  • 9
  • 72
  • 119
  • I've been trying to build something similar, but am running into a lot of issues in Python3 -- what would be the current way to do this? I'm returning None's with this solution. – Zach Oakes Feb 04 '21 at 18:25
1

You should probably use email.parser

s = """
From root@a1.local.tld Thu Jul 25 19:28:59 2013
Received: from a1.local.tld (localhost [127.0.0.1])
    by a1.local.tld (8.14.4/8.14.4) with ESMTP id r6Q2SxeQ003866
    for <ooo@a1.local.tld>; Thu, 25 Jul 2013 19:28:59 -0700
Received: (from root@localhost)
    by a1.local.tld (8.14.4/8.14.4/Submit) id r6Q2Sxbh003865;
    Thu, 25 Jul 2013 19:28:59 -0700
From: root@a1.local.tld
Subject: ooooooooooooooooooooooo
To: ooo@a1.local.tld
Cc: 
X-Originating-IP: 192.168.15.127
X-Mailer: Webmin 1.420
Message-Id: <1374805739.3861@a1>
Date: Thu, 25 Jul 2013 19:28:59 -0700 (PDT)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="bound1374805739"

This is a multi-part message in MIME format.

--bound1374805739
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo
ooooooooooooooooooooooooooooooo

--bound1374805739--
"""

import email.parser

msg = email.parser.Parser().parsestr(s)
help(msg)
Mark Roberts
  • 462
  • 4
  • 6
0

you could write that raw content to a file

then read the file like this:

with open('in.txt', 'r') as file:
    raw = file.readlines()

get_list = ['From:','To:','Subject:']
info_list = []

for i in raw:
    for word in get_list:
        if i.startswith(word):
            info_list.append(i)

now info_list will be:

['From: root@a1.local.tld', 'Subject: ooooooooooooooooooooooo', 'To: ooo@a1.local.tld']

i dont see Body: in your raw content

Serial
  • 7,925
  • 13
  • 52
  • 71