0

So, I am writing a script to look for all unseen emails of given Gmail, and I am grouping those emails by their sender's address as key and UIDs as value.

But it takes a lot of time to group these emails, it there any way to group it faster, Does CPU plays an important factor here? because I have only 4gb of it.

import imapclient, pyzmail,imaplib,pprint,time
imaplib._MAXLINE = 10000000
imapobj = imapclient.IMAPClient('imap.gmail.com',ssl = True)

imapobj.login('MAIL ID','PASSWORD')
print('Log in Successful...')
imapobj.select_folder('INBOX',readonly=False)
print('INBOX checking...')
uids = imapobj.search(['UNSEEN'])
len_uids = len(uids)
from_dict = {}
print('Grouping....')
time1 = time.time()
iteration = 0
for UID in uids:
    iteration += 1
    if UID == (len_uids//2):
        print('Half done..')
        time2 = time.time()
        print(round(time2 - time1,1))

    rawmessages = imapobj.fetch([UID],['BODY[]','FLAGS'])
    message = pyzmail.PyzMessage.factory(rawmessages[UID][b'BODY[]'])
    try:
        from_dict[message.get_addresses('from')[0][1]].append(UID)
    except KeyError:
        from_dict.setdefault(message.get_addresses('from')[0][1],[UID])
    print(str(UID)+' is grouped.')
    print('********Total mails to group is:'+str(len_uids - 
    iteration)+'********')

pprint.pprint(from_dict)

imapobj.logout()
  • 3
    The bottleneck looks to be the network. You could parallelize that. Gmail may complain at a certain rate, however. – Him Dec 27 '18 at 14:29
  • Replace `print`s with [logging](https://docs.python.org/3/library/logging.html). Remove `iteration` and read about [enumerate](https://stackoverflow.com/questions/22171558/what-does-enumerate-mean/22171593). This can make this code faster. – Max Dec 27 '18 at 14:29
  • 2
    @JonhyBeebop: I suspect that both your suggestions are pessimizations, if they have any impact at all compared to network delay. `logging` adds several layer of abstraction for each call compared to `print`, which may pay for it only if it ends up not printing/not flushing; `enumerate` replaces a local variable increment (cheap) with an iterator/generator ping-pong at each iteration. Again, they are likely irrelevant compared to the slowness of the network, but thinking they can give better performance is misguided. – Matteo Italia Dec 27 '18 at 14:40
  • Isn't there a way to get the sender address without loading the body? – Mikhail Berlinkov Dec 27 '18 at 18:40

0 Answers0