0

I am trying to create a .xlsx log of an entire Outlook inbox. To do this, I have win32com getting the inbox object and iterating through each item.

My issue arises with how slow the process is, as it needs to deal with 10,000-100,000 emails. At the moment I have attempting it with 15,000 emails and it is taking over an hour.

I believe my solution to this is multiprocessing? But I am unable to pass the win32com object to a function, as it cannot be pickled.

import win32com.client
from multiprocessing import Pool, cpu_count


def create_email(data_input):
    this_email = [data_input.Subject, data_input.Body]
    return this_email


def init_pool():
    outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
    account = outlook.Folders.Item("EMAIL HERE")
    inbox = account.Folders.Item("Inbox").Items
    p = Pool(cpu_count())
    inbox_list = p.map(create_email, inbox)


if __name__ == '__main__':
    init_pool()

Replace EMAIL HERE with the email being used.

Aden
  • 1
  • Right, you can't share COM objects with a different process. You'd have to call `win32com.client.Dispatch` again. It's not clear to me that this will help you very much. If you are talking with an Exchange server, the bottleneck is going to be the path to Exchange, not the path to Outlook. – Tim Roberts Feb 14 '22 at 01:58
  • Outlook also stores your emails as an .ost file. You might make a copy of that and process it instead. See [How do I parse an .ost file into separate emails](https://stackoverflow.com/questions/54199944/how-do-i-parse-an-ost-file-into-separate-emails-preferably-with-python). – Ouroborus Feb 14 '22 at 02:06
  • @TimRoberts I was wondering if there was a way to serialise it to make it iterable with a function, but I'm beginning to think this is impossible without just iterating through it first. – Aden Feb 15 '22 at 03:05
  • @Ouroborus Unfortunately this isn't possible since the .ost file is 20GB+. I've since tried multithreading the loop, and I can process about 10,000 emails an hour, as opposed to 8,000 emails an hours normally. I don't think this is the solution I'm after but it's the best I can come up with right now. – Aden Feb 15 '22 at 03:08

0 Answers0