1

I was wondering how to get MIME message graph structure in Python (for example, as an adjacency matrix).

According to the official Python3 documentation there is an email.walk() method that iterates through all message parts:

for part in email.walk():
   print(part.get_content_type())

However the output does not show the hierarchical structure of a message. For example, the following output:

multipart/report
text/plain
message/delivery-status
text/plain
text/plain
message/rfc822
text/plain

It can represent any one of these two tree structures:

multipart/report
    text/plain
    message/rfc822
        text/plain
        text/plain
    message/rfc822
        text/plain

or

multipart/report
    text/plain
    message/rfc822
        text/plain
        text/plain
        message/rfc822
            text/plain

Is there any method in python that could help determine the exact hierarchical (graph) structure if an MIME message?

Konov Mike
  • 13
  • 3
  • Please provide enough code so others can better understand the problem. What have you tried, or what are you considering trying? – dsillman2000 Oct 27 '21 at 13:49

2 Answers2

0

Let's say you have read the email into variable email.

Then if you do print(email.get_content_type()) it should show something like multipart/report (to take the example which you have provided).

Then you can try

if email.is_multipart():
    for subpart in email.get_payload():
        print(email.get_content_type())

Then this will print

text/plain
message/rfc822

if you consider the second tree structure which you have provided as example.

You can do the above for any part of the email; if it is multipart then it will break it down into its components basically.

You can use this to create a recursive function which might print a tab before printing the content type of the part, depending on how deeply it is nested.

It has been a while since I've last worked with email but this should do the trick.

0

Here's a function that recursively builds up the message structure as suggested in the existing answer. It also shows the section index that you'd use to request body parts in, e.g., an IMAP FETCH command:

def print_structure(message, level=0, section='TEXT'):
    print('%s[%s]' % ('\t' * level, section), message.get_content_type())
    if message.is_multipart():
        for i, subpart in enumerate(message.get_payload()):
            subsection = f"{'' if section == 'TEXT' else section + '.'}{i + 1}"
            print_structure(subpart, level=level + 1, section=subsection)

As an example, if you load this sample email into the variable email, calling print_structure(email) produces the following:

[TEXT] multipart/mixed
    [1] multipart/related
        [1.1] multipart/alternative
            [1.1.1] text/plain
            [1.1.2] text/html
        [1.2] image/gif
        [1.3] image/gif
        [1.4] image/gif
    [2] image/gif
    [3] image/gif
    [4] image/gif
Simon
  • 93
  • 6