I'm trying to scrape data from a specific folder in a Gmail account I have access to.
I recently tried running this code using Python 2.7 on Windows 7 while logged into the Gmail account of interest. For some reason though it seems to run for a long time (I left it for as long as 40 minutes) without completing or providing an error.
As it stands right now the folder I'm targeting in the Gmail account only has about 50 simple text emails with no attachments, pictures, or anything that might suggest the process should take as long as it does. Has anyone come across an issue like this before doing something similar with IMAP?
Code for completeness:
#!/usr/bin/env python
#
# Very simple Python script to dump all emails in an IMAP folder to files.
# This code is released into the public domain.
#
# RKI Nov 2013
#
import sys
import imaplib
import getpass
IMAP_SERVER = 'imap.gmail.com'
EMAIL_ACCOUNT = "notatallawhistleblowerIswear@gmail.com"
EMAIL_FOLDER = "Top Secret/PRISM Documents"
OUTPUT_DIRECTORY = 'C:/src/tmp'
PASSWORD = getpass.getpass()
def process_mailbox(M):
"""
Dump all emails in the folder to files in output directory.
"""
rv, data = M.search(None, "ALL")
if rv != 'OK':
print "No messages found!"
return
for num in data[0].split():
rv, data = M.fetch(num, '(RFC822)')
if rv != 'OK':
print "ERROR getting message", num
return
print "Writing message ", num
f = open('%s/%s.eml' %(OUTPUT_DIRECTORY, num), 'wb')
f.write(data[0][1])
f.close()
def main():
M = imaplib.IMAP4_SSL(IMAP_SERVER)
M.login(EMAIL_ACCOUNT, PASSWORD)
rv, data = M.select(EMAIL_FOLDER)
if rv == 'OK':
print "Processing mailbox: ", EMAIL_FOLDER
process_mailbox(M)
M.close()
else:
print "ERROR: Unable to open mailbox ", rv
M.logout()
if __name__ == "__main__":
main()