2

I'm trying to download a Google doc to PDF or Sheet to XLS given an ID programmatically from the CLI.

Steps I've tried so far:

  1. Contact support, but can't see a (?) help icon
  2. Google for 10 minutes... I think Google Drive API does this (not sure)
  3. Enable the Google Drive API
  4. Signed up for a GCP project
  5. Navigated thought the UI to enable the API
  6. Trying the GET API results in 400 Invalid field selection using the fields for the ID of the document

I'm a bit stuck now and I am not sure how to proceed. Any suggestions?

hendry
  • 9,725
  • 18
  • 81
  • 139
  • When I saw your demonstration video, I think that you are using "About: get" in Drive API v3 and the file ID is put to `fields`. In this case, such error occurs. I think that the reason of your issue is due to this. When you want to see the file metadata using "Try this API", please use "Files: get" by putting the file ID to `fileId`. [Ref](https://developers.google.com/drive/api/v3/reference/files/get) By the way, can you provide the detail information about `CLI` of `programmatically from the CLI`? – Tanaike Jan 21 '21 at 06:10
  • What language are you using? Can you provide the code you are working on, specially the request you are making? – Iamblichus Jan 21 '21 at 08:49
  • @Tanaike https://developers.google.com/drive/api/v3/reference/files/get WORKS.. so how does one download the file? – hendry Jan 21 '21 at 09:03
  • Thank you for replying. I have to apologize for my poor English skill. In order to download the file from Google Drive, there are 2 directions. 1. When the file is yours and the file is not publicly shared, you can download it using the access token. [Ref](https://stackoverflow.com/q/60608901) 2. When the file is publicly shared, you can download it without using the access token. [Ref](https://stackoverflow.com/q/48133080) By the way, can you provide the detail information about `CLI` of `programmatically from the CLI`? – Tanaike Jan 21 '21 at 09:08

1 Answers1

1

Warning: hopefully informative wall of text ahead! I've also uploaded the full Jupyter Notebook for you to clone and run here since, as you've realized, putting this sort of stuff together can be challenging.


Since we're going to be exporting files via the google drive API, we need credentials for that scope as detailed in https://developers.google.com/drive/api/v3/reference/files/export#auth.

However, first we need to choose an authentication method as detailed in https://developers.google.com/identity/protocols/oauth2#scenarios.

Since you mentioned creating a GCP project, I assume you're interested in using a GCP service account as detailed in https://developers.google.com/identity/protocols/oauth2#serviceaccount

You can create a service account at https://console.developers.google.com/apis/credentials or as explained in https://developers.google.com/identity/protocols/oauth2/service-account#creatinganaccount

Make sure to enable domain-wide-delegation for that service account while creating it and grant it https://www.googleapis.com/auth/drive scope under https://admin.google.com/ac/owl/domainwidedelegation since you otherwise won't be able to impersonate other users, including yourself, and download their files.

We then use the SERVICE_ACCOUNT_FILE we just downloaded and the SCOPES we defined to create a Credentials object.

However, you'll need to first install the Python bindings for the Google API as per https://developers.google.com/drive/api/v3/quickstart/python (pip3 install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib)

With that, the following should be enough to authenticate to the API:

from googleapiclient.discovery import build
from google.oauth2 import service_account

SCOPES = ['https://www.googleapis.com/auth/drive']
SERVICE_ACCOUNT_FILE = 'credentials.json'

credentials = service_account.Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE,
                                                      scopes=SCOPES)

# Remember, you must have created credentials.json with domain-wide delegation!
credentials = credentials.with_subject('user@example.com')

# We then build a drive_v3 service using the credentials we just created

service = build('drive', 'v3', credentials=credentials)

We can access the files resource as shown in https://developers.google.com/drive/api/v3/reference/files/get and request the metadata of a file to which user@example.com has access https://docs.google.com/document/d/fileId/edit. In your case fileId=141g8UkQfdMQSTfIn475gHj1ezZVV16f5ONDxpWrrvts.

files = service.files()
print(service.files().get(fileId='1U3eMevKxTwDxzvOBUsqa36zvwBzKPVYOFgy3k_9vxb8').execute())

{'kind': 'drive#file', 'id': '1U3eMevKxTwDxzvOBUsqa36zvwBzKPVYOFgy3k_9vxb8', 'name': 'empty', 'mimeType': 'application/vnd.google-apps.document'}

We access the files resource again but this time to export the file as detailed in https://developers.google.com/resources/api-libraries/documentation/drive/v3/python/latest/drive_v3.files.html#export This could also be achieved using https://developers.google.com/drive/api/v3/manage-downloads.

Valid MIME types are listed in https://developers.google.com/drive/api/v3/ref-export-formats.

fconr = files.export(fileId='1U3eMevKxTwDxzvOBUsqa36zvwBzKPVYOFgy3k_9vxb8',
mimeType='application/vnd.openxmlformats-officedocument.wordprocessingml.document')

fcont = fconr.execute()

print('{}...'.format(fcont[:10]))

file = open("/tmp/sample.doc", "wb")
file.write(fcont)
file.close()

b'MN\xc30\x10\x85O\xc0\x1d"'...

As you can see, fcont contains a binary blob that corresponds to the document and of which I'm showing the first 10 bytes. Finally, the blob is saved to sample.doc.

ls -alh1 /tmp/sample.doc

-rw-rw-r-- 1 jdsalaro jdsalaro 6,0K Jan 20 23:38 /tmp/sample.doc

As mentioned above, I encourage you to experiment with the Jupyter notebook once you've created the service account with domain-wide delegation, have saved it to credentials.json and have granted it the https://www.googleapis.com/auth/drive scope.

jdsalaro
  • 378
  • 2
  • 10
  • [v3 export API works](https://s.natalian.org/2021-01-22/1611279722_1920x1080.png)! I guess what I am confused about is authentication. What is the _easiest_ on the brain? I don't need a service account ideally, I just want to auth as me. As you can see from the [screen shot](https://s.natalian.org/2021-01-22/two.png) why there are two elements, API key *and* Oauth. I need both? – hendry Jan 22 '21 at 01:43
  • @hendry Service accounts are indeed the easiest way to interact with the Google API. [Anything else](https://developers.google.com/identity/protocols/oauth2#scenarios) will require you to understand how to use Oauth2 properly; and that will always be more involved than using service accounts for authentication. For exporting files you only need OAuth, you can try it yourself by removing the `API Key` check-mark. – jdsalaro Jan 22 '21 at 04:44
  • However, I strongly suggest you clone [the notebook I provided](https://github.com/jdsalaro/snippets/blob/main/google_drive_api/stackoverflow-65820541-how-to-download-g-suite-docs-sheets-to-pdf-xls-programatically.ipynb) and get it to work. You should understand that example as to not make costly mistakes from a security point of view or otherwise. In case you do want to use OAuth instead of Service Accounts, [the Google OAuth Playground](https://developers.google.com/oauthplayground/) is a good start. – jdsalaro Jan 22 '21 at 04:50
  • @jdalaro I emailed you a screen cast why domain-wide-delegation / owl doesn't appear to work. I guess I'm only left with Oauth? – hendry Jan 22 '21 at 09:09
  • @hendry Service accounts and domain-wide delegation are OAuth too. The difference is there is no user interaction. If you edit your question to provide more information about your situation, what you tried, what code you're using, etc. it's more likely that you'll get help. – Iamblichus Jan 25 '21 at 10:19