3

I'm trying to scrape data from a specific folder in a Gmail account I have access to.

I recently tried running this code using Python 2.7 on Windows 7 while logged into the Gmail account of interest. For some reason though it seems to run for a long time (I left it for as long as 40 minutes) without completing or providing an error.

As it stands right now the folder I'm targeting in the Gmail account only has about 50 simple text emails with no attachments, pictures, or anything that might suggest the process should take as long as it does. Has anyone come across an issue like this before doing something similar with IMAP?

Code for completeness:

#!/usr/bin/env python
#
# Very simple Python script to dump all emails in an IMAP folder to files.  
# This code is released into the public domain.
#
# RKI Nov 2013
#
import sys
import imaplib
import getpass

IMAP_SERVER = 'imap.gmail.com'
EMAIL_ACCOUNT = "notatallawhistleblowerIswear@gmail.com"
EMAIL_FOLDER = "Top Secret/PRISM Documents"
OUTPUT_DIRECTORY = 'C:/src/tmp'

PASSWORD = getpass.getpass()


def process_mailbox(M):
    """
    Dump all emails in the folder to files in output directory.
    """

    rv, data = M.search(None, "ALL")
    if rv != 'OK':
        print "No messages found!"
        return

    for num in data[0].split():
        rv, data = M.fetch(num, '(RFC822)')
        if rv != 'OK':
            print "ERROR getting message", num
            return
        print "Writing message ", num
        f = open('%s/%s.eml' %(OUTPUT_DIRECTORY, num), 'wb')
        f.write(data[0][1])
        f.close()

def main():
    M = imaplib.IMAP4_SSL(IMAP_SERVER)
    M.login(EMAIL_ACCOUNT, PASSWORD)
    rv, data = M.select(EMAIL_FOLDER)
    if rv == 'OK':
        print "Processing mailbox: ", EMAIL_FOLDER
        process_mailbox(M)
        M.close()
    else:
        print "ERROR: Unable to open mailbox ", rv
    M.logout()

if __name__ == "__main__":
    main()
114
  • 876
  • 3
  • 25
  • 51
  • 1
    What does it print out? What do you see in data? Does it print out Writing message at all? Seems like some 'print' debugging would get you a long way here. – Max Jan 21 '16 at 19:50
  • @Max It doesn't seem to print out anything unfortunately. What do you think the best way to get started with print debugging would be? Any particular areas? – 114 Jan 22 '16 at 15:20
  • 1
    [This page](https://pymotw.com/2/imaplib/) seems to suggest adding the line `imaplib.Debug = 4` in order to get more information about what's going on. – legoscia Jan 22 '16 at 16:38
  • Personally, I could run it without any trouble. I am on Linux, with python 2.7.6. I just had to allow less secure app connection in gmail and there it was. It processed 20 mails (the whole inbox) in less than 1 minute. Do you have more info ? – Derlin Jan 31 '16 at 17:57
  • Albeit not very professional, I put `print('checkpoint')` in my code to see where exactly the program hangs up. It helps when you don't know where the issue is exactly because no errors are thrown. – ATLUS Feb 01 '16 at 08:40
  • @ATLUS Thanks, that's a good trick. As I suspected it's getting stuck on getpass.getpass(). Is there likely an easy way around this? Can I simply replace this with the actual password? – 114 Feb 01 '16 at 18:24
  • @ATLUS If I attempt to use the actual password to a test email account I get the error 'imaplib.error: [ALERT] Please log in via your web browser: https://support.google.com/mail/accounts/answer/78754 (Failure)' [Google attempts to block me from accessing the account]. – 114 Feb 01 '16 at 18:36

1 Answers1

1

The code works fine for me. Below, I have added some debug prints to your code (using pprint) to view the attributes of the IMAP4_SSL object M. My Gmail uses two factor authentication so I needed to setup a gmail app password

from pprint import pprint 

# ....

M = imaplib.IMAP4_SSL(IMAP_SERVER)
print('---- Attributes of the IMAP4_SSL connection before login ----')
pprint(vars(M))

M.login(EMAIL_ACCOUNT, PASSWORD)
print('\n \n')
print('---- Attributes of the IMAP4_SSL connection after login ----')
pprint(vars(M))

# open specific folder
rv, data = M.select(EMAIL_FOLDER)
print('\n \n')
print('---- Data returned from select of folder = {}'.format(data))
  • Check the first pprint(vars(M)) for:
    1. 'welcome': '\* OK Gimap ready for requests from ...
    2. 'port': 993,
  • Check the second pprint(vars(M)) for:
    1. _cmd_log for a successful login: 6: ('< PJIL1 OK **@gmail.com authenticated (Success)
  • data returned from M.select(EMAIL_FOLDER) should be the number of emails available to download.
ljk07
  • 952
  • 5
  • 13
  • I tried putting in those debug prints but it looks like the code never even reaches that point since no messages came through. I tried this with a couple of different email addresses. Checked both accounts to confirm an app password was not needed, got message "setting...is not available for your account". – 114 Feb 01 '16 at 18:02
  • Please see the option in gmail to enable 'less secure apps': https://support.google.com/accounts/answer/6010255?hl=en – ljk07 Feb 02 '16 at 03:13
  • Your question may be a duplicate of this question: http://stackoverflow.com/questions/25413301/gmail-login-failure-using-python-and-imaplib – ljk07 Feb 02 '16 at 03:19
  • Thanks, I'll look into both of those. I'm going to award you the bounty since otherwise it will go unused, but I may still need some help on this. – 114 Feb 03 '16 at 20:30
  • Changing the security setting doesn't seem to have had an effect on the code hanging unfortunately. – 114 Feb 03 '16 at 20:38
  • Based on [this question](http://stackoverflow.com/questions/24544353/python-getpass-getpass-function-call-hangs) it looks like the issue may after all this have to do with the fact that I'm using the Anaconda distribution. Will try with a different distribution. – 114 Feb 03 '16 at 21:27