54

Looked around and couldn't find a satisfactory answer. Does anyone know how to parse .msg files from outlook with Python?

I've tried using mimetools and email.parser with no luck. Help would be greatly appreciated!

Ryabchenko Alexander
  • 10,057
  • 7
  • 56
  • 88
Michael
  • 7,087
  • 21
  • 52
  • 81

7 Answers7

59

This works for me:

import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
msg = outlook.OpenSharedItem(r"C:\test_msg.msg")

print msg.SenderName
print msg.SenderEmailAddress
print msg.SentOn
print msg.To
print msg.CC
print msg.BCC
print msg.Subject
print msg.Body

count_attachments = msg.Attachments.Count
if count_attachments > 0:
    for item in range(count_attachments):
        print msg.Attachments.Item(item + 1).Filename

del outlook, msg

Please refer to the following post regarding methods to access email addresses and not just the names (ex. "John Doe") from the To, CC and BCC properties - enter link description here

Brent Edwards
  • 714
  • 9
  • 9
  • 15
    Its important to note that the OpenSharedItem method expects an absolute path otherwise you get an error. – smartexpert Jun 17 '16 at 08:58
  • 2
    I seem to have problems with the encoding. How can you solve that? – firko Mar 01 '17 at 20:39
  • msg.SenderEmailAddress, msg.To, msg.CC, msg.BCC only gives names and not emails address. – Amey P Naik May 02 '19 at 12:59
  • 1
    Please note that this solution requires to have an active outlook account which might not be available on every system to which you deploy. – Cribber Nov 15 '19 at 08:26
  • This is Python2 code but works in Python3 if you add brackets to the print command – 576i Jun 10 '20 at 10:02
  • What if msg files were dotted across different subfolders in specified directory? How would one resolve this issue? Side note: As @smartexpert has stated, OpenSharedItem expects an absolute path and the issue with looping through paths seems to be that the program does not read it as a raw string Error: (-2147352567, 'Exception occurred.', (4096, 'Microsoft Outlook', 'Cannot find X:\\00000000\\000032C900000000.msg. It may have been moved or deleted. Cannot find this file. Verify the path and file name are correct.', None, 0, -2147024894), None) – Nabih Apr 06 '21 at 11:48
  • How do you list all the supported properties? `dir(msg)` doesn't work (perhaps as expected since this is a COM object not a Python object) – Jason S Aug 19 '21 at 16:48
  • I guess it's https://learn.microsoft.com/en-us/office/vba/api/outlook.mailitem – Jason S Aug 19 '21 at 16:50
  • Works, but the issue is it destroys the structure of the email, like tables are gone. Any way to read emails in HTML perhaps? – Rafs Aug 31 '23 at 16:31
45

I succeeded extracting relevant fields from MS Outlook files (.msg) using msg-extractor utilitity by Matt Walker.

Prerequesites

pip install extract-msg

Note, it may require to install additional modules, in my case, it required to install imapclient:

pip install imapclient

Usage

import extract_msg

f = r'MS_Outlook_file.msg'  # Replace with yours
msg = extract_msg.Message(f)
msg_sender = msg.sender
msg_date = msg.date
msg_subj = msg.subject
msg_message = msg.body
msg.close()

print('Sender: {}'.format(msg_sender))
print('Sent On: {}'.format(msg_date))
print('Subject: {}'.format(msg_subj))
print('Body: {}'.format(msg_message))

There are many other goodies in MsgExtractor utility, to be explored, but this is good to start with.

Note

I had to comment out lines 3 to 8 within the file C:\Anaconda3\Scripts\ExtractMsg.py:

#"""
#ExtractMsg:
#    Extracts emails and attachments saved in Microsoft Outlook's .msg files
#
#https://github.com/mattgwwalker/msg-extractor
#"""

Error message was:

line 3
    ExtractMsg:
              ^
SyntaxError: invalid syntax

After blocking those lines, the error message disappeared and the code worked just fine.

Rafs
  • 614
  • 8
  • 19
Vladimir Lukin
  • 591
  • 4
  • 3
  • 3
    Of the various libraries I tried, This is the only one that worked for Linux based machines. – Vishnu Y S May 11 '18 at 05:47
  • 3
    the fact that it also works without a running outlook client is gold. – Cribber Jan 02 '20 at 07:08
  • @Vladimir Lukin, But, How can we extract the sender, subject from the forwarded emails?? Any way to get that? – Pravin Jan 27 '23 at 11:59
  • Although the software is still not production-ready, it runs without an outlook client as said. However, the provided code destroys the format of the email. I tried `html_body = msg.htmlBody` and `html_prepared = msg.htmlBodyPrepared` but they are a bit messy. – Rafs Aug 31 '23 at 16:48
6

Even though this is an old thread, I hope this information might help someone who is looking for a solution to what the thread subject exactly says. I strongly advise using the solution of mattgwwalker in github, which requires OleFileIO_PL module to be installed externally.

fatih_dur
  • 266
  • 5
  • 17
  • First repo is not yet production-ready (judging from the semantic versioning) and the second one is deleted. – Rafs Aug 31 '23 at 16:16
2

The extract-msg Python module (pip install extract-msg) is also extremely useful because it allows quick access to the full headers from the message, something that Outlook makes much harder than necessary to get hold of.

My modification of Vladimir's code that shows full headers is:

#!/usr/bin/env python3

import extract_msg
import sys

msg = extract_msg.Message(sys.argv[1])
msg_sender = msg.sender
msg_date = msg.date
msg_subj = msg.subject

print('Sender: {}'.format(msg_sender))
print('Sent On: {}'.format(msg_date))
print('Subject: {}'.format(msg_subj))

print ("=== Details ===")

for k, v in msg.header.items():
    print("{}: {}".format(k, v))

print(msg.body)
1

I was able to parse it similar way as Vladimir mentioned above. However I needed to make small change by adding a for loop. The glob.glob(r'c:\test_email*.msg') returns a list whereas the Message(f) expect a file or str.

f = glob.glob(r'c:\test_email\*.msg')

for filename in f:
    msg = ExtractMsg.Message(filename)
    msg_sender = msg.sender
    msg_date = msg.date
    msg_subj = msg.subject
    msg_message = msg.body
Sazzad
  • 21
  • 3
1

I found on the net a module called MSG PY. This is Microsoft Outlook .msg file module for Python. The module allows you to easy create/read/parse/convert Outlook .msg files. The module does not require Microsoft Outlook to be installed on the machine or any other third party application or library in order to work. For example:

from independentsoft.msg import Message

appointment = Message("e:\\appointment.msg")

print("subject: " + str(appointment.subject))
print("start_time: " + str(appointment.appointment_start_time))
print("end_time: " + str(appointment.appointment_end_time))
print("location: " + str(appointment.location))
print("is_reminder_set: " + str(appointment.is_reminder_set))
print("sender_name: " + str(appointment.sender_name))
print("sender_email_address: " + str(appointment.sender_email_address))
print("display_to: " + str(appointment.display_to))
print("display_cc: " + str(appointment.display_cc))
print("body: " + str(appointment.body))
Uros
  • 29
  • 1
0

I've tried the python email module and sometimes that doesn't successfully parse the msg file.

So, in this case, if you are only after text or html, the following code worked for me.

start_text = "<html>"
end_text = "</html>"
def parse_msg(msg_file,start_text,end_text):
  with  open(msg_file) as f:
    b=f.read()
  return b[b.find(start_text):b.find(end_text)+len(end_text)]

print parse_msg(path_to_msg_file,start_text,end_text)
paolov
  • 2,139
  • 1
  • 34
  • 43
  • That's just a encoding / decoding issue. Verify the charset of your email body and handle appropriately – paolov Mar 20 '19 at 22:58
  • I used `open(msg_file, encoding='latin-1')` and resolved a decoding issue, but the functions reads `''` so nothing is read. I suspect the email doesn't have html tags to find. How to read only text? – Rafs Aug 31 '23 at 16:22