0

I'm try to analyse my 25k+ emails similar to the post here: http://beneathdata.com/how-to/email-behavior-analysis/

While the mentioned script used IMAP, I'm trying to implement this using the Gmail API for improved security. I'm using Python (and Pandas for data analysis) but the question applies more generally to use of the Gmail API.

From the docs, I'm able to read emails in using:

msgs = service.users().messages().list(userId='me', maxResults=500).execute()

and then access the data using a loop:

for msg in msgs['messages']:
    m_id = msg['id'] # get id of individual message
    message = service.users().messages().get(userId='me', id=m_id).execute()
    payload = message['payload'] 
    header = payload['headers']

    for item in header:
        if item['name'] == 'Date':
           date = item['value']
           ** DATA STORAGE FUNCTIONS ETC **

but this is clearly very slow. In addition to looping over every message, I have to call the list() API call many times to cycle through all emails.

Is there a higher performance way to do this? e.g. to ask the API to only return the data rather than all unwanted message information.

Thanks.

Reference: https://developers.google.com/resources/api-libraries/documentation/gmail/v1/python/latest/gmail_v1.users.messages.html

SLater01
  • 459
  • 1
  • 6
  • 17

1 Answers1

4

You can batch your messages.get() operations into a batch, see: https://developers.google.com/gmail/api/guides/batch

You can put up to 100 requests into a batch.

Note that "a set of n requests batched together counts toward your usage limit as n requests, not as one request." So you may need to do some pacing to stay below request rate limits.

Here's a rough Python example that will fetch the messages given by a list of ids id_list

msgs = []
def fetch(rid, response, exception):
    if exception is not None:
        print exception
    else:
        msgs.append(response)

# Make a batch request
batch = gmail.new_batch_http_request()
for message_id in id_list:
    t = gmail.users().messages().get(userId='me', id=message_id, format=fmt)
    batch.add(t, callback=fetch)

batch.execute(http=http)
payne
  • 13,833
  • 5
  • 42
  • 49
  • Many thanks for your help! Batch sounds like what I'm looking for. But this still gets the entire message and then I loop through all messages to extract the data. Do you know of a way to only return certain data / do this more efficiently? Also, I guess I still need to use list / list_next before the batch call in order to get message ids? Cheers – SLater01 Oct 07 '17 at 01:42
  • What "certain data" do you want? (And yes, you use list() to get a list of mesage ids then get() to fetch the details). – payne Oct 07 '17 at 12:49
  • 1
    I've got it working thanks by setting format='minimal'. This then ignores the message body etc and avoids wasteful data transfer. – SLater01 Oct 07 '17 at 12:57
  • 1
    And, I'm sure you have by now discovered that Date means a free form hot text mess that will require all of Jarvis's computing power to parse..... – boatcoder Sep 01 '21 at 01:01