0

I have a simple piece of code with will look at a email header and pull out the date, from, to and subject of the email header. To do this i must put the email header into a .txt document in order for the code to read the header.

from email.parser import BytesHeaderParser
from glob import glob
import csv

fields = ['Date', 'From', 'To', 'Subject']

out = csv.writer(open('output.csv', 'w'))
out.writerow(["File name"]+fields)

parser = BytesHeaderParser()

for name in glob('*.msg'):
with open(name, 'rb') as fd:
msg = parser.parse(fd)
out.writerow([name]+[msg[f] for f in fields])

I want to be able to do this in a mass amount, so when dealing with large amounts of emails from the same 'phishing campaign' i can put all the .msg into one folder and run the script to extract the data i need.

Is this possible also willing to do the code in powershell.

Thanks.

Will
  • 255
  • 3
  • 14
  • Possible duplicate of [Using Python to execute a command on every file in a folder](https://stackoverflow.com/questions/1120707/using-python-to-execute-a-command-on-every-file-in-a-folder) – M-- Dec 03 '18 at 15:32

1 Answers1

0

I'd strongly suggest using one of the mime parsers built into Python for handling emails. It's a relatively complicated format and doing naive things like you do above will tend to give you the wrong thing. For example header lines can span multiple lines and you'd just get some of it with your code.

it should be a simple matter of doing:

from email.parser import HeaderParser
from glob import glob
import csv

fields = ['Date', 'From', 'To', 'Subject']

out = csv.writer(open('output.csv', 'w'))
out.writerow(["File name"]+fields)

parser = HeaderParser()

for name in glob('*.msg'):
  with open(name) as fd:
    msg = parser.parse(fd)
  out.writerow([name]+[msg[f] for f in fields])
Sam Mason
  • 15,216
  • 1
  • 41
  • 60
  • gives me this? Traceback (most recent call last): File "new.py", line 12, in [msg[f] for f in fields]) TypeError: writelines() argument must be a sequence of strings – Will Dec 02 '18 at 20:32
  • presumably some of those header fields don't exist. maybe use a file format that makes that easier to handle? CSV would be ideal for this! – Sam Mason Dec 02 '18 at 20:36
  • Sorry dude i'm new to this really what do you mean? – Will Dec 02 '18 at 20:41
  • so you saying to write to the CSV? – Will Dec 02 '18 at 20:43
  • @Will updated to write to csv file… I'd suggest reading some Python/programming tutorials to get started – Sam Mason Dec 02 '18 at 20:54
  • So i added the script and 3 .msg files to a folder and ran the script from the folder. I get an error. I have eddited the script to only look for the "From:" aspect. return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 2129: character maps to – Will Dec 02 '18 at 21:05
  • you might want to look at opening the file in "binary" mode and using a `BytesHeaderParser`. character encoding can be difficult to get right and depends on a few system specific details. might be worth reading some tutorials about that, also Python changed behavior a lot between version 2 and 3, so you'll need to make sure you're reading the right one. Python 3 is much nicer if you have the choice! – Sam Mason Dec 02 '18 at 21:11
  • i've added an encoder as reading online it can solve the issue with open(name, encoding='utf8') as fd: but the error seems to be with msg = parser.parse(fd) when im tracing it back – Will Dec 02 '18 at 21:34
  • depending on where they come from, it can be better to just treat the email as bytes (i.e. `open(name, 'rb')`) and then use use an email parser that does the right thing. i.e. `…`gets encoded as `=?UTF-8?B?4oCm?=` in a subject line which is separate from the underlying file's encoding. which you want depends on what you do with it later – Sam Mason Dec 03 '18 at 09:28
  • Ok so i updates the code (See update) i dont get any errors now but nothing goes to the file? is there anyway to see if the files are being read? – Will Dec 03 '18 at 19:12
  • try googling for "Python debugging". easiest is to add some `print` statements, otherwise look for an "interactive debugger". stack overflow really isn't the best for these sorts of extended discussions, it's really about well defined questions and answers… – Sam Mason Dec 03 '18 at 23:03