I am trying to read through an outlook folder and get the ReceivedTime,CC,Subject,HTMLBody but extract the table into columns. I can pull 1) ReceivedTime,CC,Subject,HTMLBody into a dataframe and I can do 2) Extract the HTMLBody tables into a dataframe but am getting stuck on doing both 1) & 2) together.
Current code:
import win32com.client
import pandas as pd
from bs4 import BeautifulSoup
outlook = win32com.client.Dispatch("Outlook.Application")
mapi = outlook.GetNamespace("MAPI")
inbox = mapi.Folders[User@email.com'].Folders['Inbox'].Folders['Subfolder Name']
Mail_Messages = inbox.Items
for mail in Mail_Messages:
receivedtime = mail.ReceivedTime.strftime('%Y-%m-%d %H:%M:%S')
cc = mail.CC
body = mail.HTMLBody
html_body = BeautifulSoup(body,"lxml")
html_tables = html_body.find_all('table')[0]
df = pd.read_html(str(html_tables),header=None)[0]
display(df)
The current data frame displays below. But I also want the related ReceivedTime, CC, & Subject.
0 | 1 | |
---|---|---|
0 | Report Name | Report.pdf |
1 | Team Name | Team A |
2 | Project Name | Project A |
3 | Unique ID Number | 123456789 |
4 | Due Date | 1/1/2021 |
But would like column [0] to be the row headers instead. So that when each email is read it would produce a dataframe that looks like this, for all the emails in the inbox subfolder:
0 | Report Name | Team Name | Project Name | Unique ID Number | Due Date | ReceivedTime | CC | Subject |
---|---|---|---|---|---|---|---|---|
1 | Report.pdf | Team A | Project A | 123456789 | 1/5/2021 | 1/1/2021 4:38:44 AM | User1@email.com, User2@email.com | Action Required:Report A Coming due |
2 | ||||||||
3 | ||||||||
4 |
But am getting stuck, still a begginer pythoner but all the other posts I've seen aren't quite getting me to what I'm trying to do. I appreciate any and all help with this.