4

I'm trying to parse a .eml file. The .eml has an excel attachment that's currently base 64 encoded. I'm trying to figure out how to decode it into XML so that I can later turn it into a CSV I can do stuff with.

This is my code right now:

import email

data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)

for part in msg.walk():
    c_type = part.get_content_type()
    c_disp = part.get('Content Disposition')


    if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
        excelContents = part.get_payload(decode = True)

        print excelContents

The problem is

When I try to decode it, it spits back something looking like this.

enter image description here

I've used this post to help me write the code above.

How can I get an email message's text content using Python?

Update:

This is exactly following the post's solution with my file, but part.get_payload() returns everything still encoded. I haven't figured out how to access the decoded content this way.

import email


data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)
for part in msg.walk():
    if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
        name = part.get_param('name') or 'MyDoc.doc'
        f = open(name, 'wb')
        f.write(part.get_payload(None, True)) 
        f.close()

        print part.get("content-transfer-encoding")
Melody Anoni
  • 223
  • 1
  • 3
  • 16
  • Hi @stovfl, I've tried that, but part.get_payload still returns the encoded attachment. I've included an updated snippet above in case I'm doing something horribly wrong – Melody Anoni Jan 07 '19 at 17:38
  • Looks good apart `MyDoc.doc`, should something like `Openworkorders.ods` or `Openworkorders.xlsx`. Try to open the saved file using Excel or OpenOffice. – stovfl Jan 07 '19 at 17:46
  • Yes, in place of `MyDoc.doc` I have my `Openworkorders.xlsx` file. Sorry I didn't add that. Thanks for your suggestions. Once I have it open in Excel, what would I do with it? I can see the content just fine, I just need to parse it into a CSV. – Melody Anoni Jan 07 '19 at 18:12
  • 1
    *"I can see the content just fine"*. You have successfully extracted the `.xlsx` from `.eml`. *"parse it into a CSV"*: The simples solution is to save it from Excel to `CSV`. If you want to do it using Python, use on Package shown at [Working with Excel Files in Python](http://www.python-excel.org/) – stovfl Jan 07 '19 at 18:18
  • 1
    Relevant [xls-to-csv-converter](https://stackoverflow.com/questions/9884353/xls-to-csv-converter) – stovfl Jan 07 '19 at 18:27
  • Sorry, maybe that wasn't clear of me. I'm asking how opening in Excel will help me solve the issue? I can do that, I can just view what the cells consist of, but the problem I'm trying to solve is to take the attachment of this excel file from a .eml and decode it. With my solution up top, I received this broken response with a bunch of � �� . When I take the SO solution, I receive the same, encoded attachment, it does nothing, essentially. I need to decode the .xlsx successfully before I can think about turning it into a CSV. – Melody Anoni Jan 07 '19 at 18:40
  • 1
    *"how opening in Excel will help me"*: `eml` => `f.write(part.get_payload(` => `*.xlsx` => open with Excel == **verified**. **Does this work?** – stovfl Jan 07 '19 at 19:42
  • Looks like I had a misunderstanding about what open() was actually doing. I finally get what you mean now, thanks for your help, haha I'll be posting my solution later. – Melody Anoni Jan 11 '19 at 14:29

2 Answers2

2

As is clear from this table (and as you have already concluded), this file is an .xlsx. You can't just decode it with unicode or base64: you need a special package. Excel files specifically are a bit tricker (for e.g. this one does PowerPoint and Word, but not Excel). There are a few online, see here - xlrd might be the best.

Josh Friedlander
  • 10,870
  • 5
  • 35
  • 75
  • Should I then install FullText just as the README directs? Also, what am I specifically supposed to do with that extension? `application/vnd.ms-excel` – Melody Anoni Jan 07 '19 at 17:39
  • 1
    Thanks for the response, you were definitely right, but because your answer a bit general, I will post my solution later. – Melody Anoni Jan 10 '19 at 15:43
  • Great! Yeah, I wasn't sure about all the details, you should definitely go ahead and post your solution – Josh Friedlander Jan 10 '19 at 16:04
0

Here is my solution:

I found 2 things out:

1.) I thought .open() was going inside the .eml and changing the selected decoded elements. I thought I needed to see decoded data before moving forward. What's really happening with .open() is it's creating a new file in the same directory of that .xlsx file. You must open the attachment before you will be able to deal with the data. 2.) You must open an xlrd workbook with the file path.

import email
import xlrd 

data = file('EmailFileName.eml').read()
    msg = email.message_from_string(data)  # entire message

    if msg.is_multipart():
        for payload in msg.get_payload():
            bdy = payload.get_payload()
    else:
        bdy = msg.get_payload()

    attachment = msg.get_payload()[1]


    # open and save excel file to disk
    f = open('excelFile.xlsx', 'wb')
    f.write(attachment.get_payload(decode=True))
    f.close()

    xls = xlrd.open_workbook(excelFilePath) # so something in quotes like '/Users/mymac/thisProjectsFolder/excelFileName.xlsx'

    # Here's a bonus for how to start accessing excel cells and rows
    for sheets in xls.sheets():
        list = []
        for rows in range(sheets.nrows):
            for col in range(sheets.ncols):
                list.append(str(sheets.cell(rows, col).value))
Melody Anoni
  • 223
  • 1
  • 3
  • 16