Batch Request for get in gmail API

Question

I have a list of around 2500 mail ids and I'm stuck to only use requests library, so so far i do it this way to get mail headers

mail_ids = ['']
for mail_id in mails_ids:
    res = requests.get(
         'https://www.googleapis.com/gmail/v1/users/me/messages/{}? 
          format=metadata'.format(mail_id), headers=headers).json()
    mail_headers = res['payload']['headers']
    ...

But its very inefficient and i would rather like to POST list of Ids instead, but on their documentation https://developers.google.com/gmail/api/v1/reference/users/messages/get, i don't see BatchGet, any workaround? I'm using Flask framework Thanks a lot

Have you seen [this answer](https://stackoverflow.com/questions/24562981/bulk-fetching-emails-in-the-new-gmail-api#answer-24586740)? — Tholle, Jun 23 '18 at 13:43
@Tholle Yes , but he use the google api client, which i can't, i need to stick to requests lib — jthemovie, Jun 23 '18 at 16:22
Alright. [This answer](https://stackoverflow.com/questions/35343365/gmail-rest-api-batch-support-for-getting-messages/35344321#35344321) and [this tiny JavaScript helper I wrote](https://github.com/EmilTholin/google-api-batch-utils/blob/master/lib/index.js) might give some inspiration on how to create a batch request manually by yourself, but I don't have a Python example, sadly. — Tholle, Jun 23 '18 at 23:25

Utkarsh Dalal · Answer 1 · 2020-08-08T15:57:44.263

This is a bit late, but in case it helps anyone, here's the code I used to do a batch get of emails:

First I get a list of relevant emails. Change the request according to your needs, I'm getting only sent emails for a certain time period:

query = "https://www.googleapis.com/gmail/v1/users/me/messages?labelIds=SENT&q=after:2020-07-25 before:2020-07-31"
response = requests.get(query, headers=header)
events = json.loads(response.content)
email_tokens = events['messages']
while 'nextPageToken' in events:
    response = requests.get(query+f"&pageToken={events['nextPageToken']}", 
                            headers=header)
    events = json.loads(response.content)
    email_tokens += events['messages']

Then I'm batching a get request to get 100 emails at a time, and parsing only the json part of the email and putting it into a list called emails. Note that there's some repeated code here, so you may want to refactor it into a method. You'll have to set your access token here:

emails = []
access_token = '1234'
header = {'Authorization': 'Bearer ' + access_token}
batch_header = header.copy()
batch_header['Content-Type'] = 'multipart/mixed; boundary="email_id"'
data = ''
ctr = 0
for token_dict in email_tokens:
    data += f'--email_id\nContent-Type: application/http\n\nGET /gmail/v1/users/me/messages/{token_dict["id"]}?format=full\n\n'
    if ctr == 99:
        data += '--email_id--'
        print(data)
        r = requests.post(f"https://www.googleapis.com/batch/gmail/v1", 
                          headers=batch_header, data=data)
        bodies = r.content.decode().split('\r\n')
        for body in bodies:
            if body.startswith('{'):
                parsed_body = json.loads(body)
                emails.append(parsed_body)
        ctr = 0
        data = ''
        continue
    ctr+=1
data += '--email_id--'
r = requests.post(f"https://www.googleapis.com/batch/gmail/v1", 
                  headers=batch_header, data=data)
bodies = r.content.decode().split('\r\n')
for body in bodies:
    if body.startswith('{'):
        parsed_body = json.loads(body)
        emails.append(parsed_body)

[Optional] Finally, I'm decoding the text in the email and storing only the last sent email instead of the whole thread. The regex used here splits on strings that I found were usually at the end of emails. For instance, On Tue, Jun 23, 2020, x@gmail.com said...:

import re
import base64
gmail_split_regex = r'On [a-zA-z]{3}, ([a-zA-z]{3}|\d{2}) ([a-zA-z]{3}|\d{2}),? \d{4}'

for email in emails:
    if 'parts' not in email['payload']:
        continue
    for part in email['payload']['parts']:
        if part['mimeType'] == 'text/plain':
            if 'uniqueBody' not in email:
                plainText = str(base64.urlsafe_b64decode(bytes(str(part['body']['data']), encoding='utf-8')))
                email['uniqueBody'] = {'content': re.split(gmail_split_regex, plainText)[0]}
        elif 'parts' in part:
            for sub_part in part['parts']:
                if sub_part['mimeType'] == 'text/plain':
                    if 'uniqueBody' not in email:
                        plainText = str(base64.urlsafe_b64decode(bytes(str(sub_part['body']['data']), encoding='utf-8')))
                        email['uniqueBody'] = {'content': re.split(gmail_split_regex, plainText)[0]}

Batch Request for get in gmail API

1 Answers1