1

My end goal is to automatically download with python (with gdown for instance) all files in a folder of a public GDrive (each file is big like 3G). After a lot of trying I finally found a way to extract all links from the folder using Google Scripts in Google Sheets so I do have all the links for all files I need to download in this format:

https://drive.google.com/file/d/IDA/view?usp=drivesdk&resourcekey=otherIDA
https://drive.google.com/file/d/IDB/view?usp=drivesdk&resourcekey=otherIDB
https://drive.google.com/file/d/IDC/view?usp=drivesdk&resourcekey=otherIDC
...
https://drive.google.com/file/d/IDZ/view?usp=drivesdk&resourcekey=otherIDZ

Then I want to iterate over the links with a for loop to download all file:

import gdown
import re
regex = "([\w-]){33}|([\w-]){19}"
download_url_basename = "https://drive.google.com/uc?export=download&id="
for i, l in enumerate(links_to_download):
    file_id = re.search(regex, url)[0]
    gdown.download(download_url_basename + file_id, f"file_{i}")

However I am met with:

Permission denied: https://drive.google.com/uc?id=ID
Maybe you need to change permission over 'Anyone with the link'?

This is a public repository so although I have access to it and have enough rights to download manually each file I only get the shareable links in view mode.

Is there a way to automatically convert the link to something that can be downloaded automatically ? Is it blocked on purpose ? Is there any way to do it automatically instead of manually downloading 400 files ?

EDIT: The question is slightly related but this issue doesn't stem from the same problem nor does it give an automatic way to handle anything.

EDIT 2: I used the google drive API python SDK, generated service account with the Google console, activated OAuth2 and generated OAuth2 json credentials to build the drive_service object:

from google_auth_oauthlib.flow import Flow, InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload, MediaIoBaseDownload
from google.auth.transport.requests import Request
import io
import re
SCOPES = ['https://www.googleapis.com/auth/drive']
CLIENT_SECRET_FILE = "myjson.json"
authorized_port = 6006 # authorize URI redirect on the console
flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
cred = flow.run_local_server(port=authorized_port)
drive_service = build("drive", "v3", credentials=cred)
download_url_basename = "https://drive.google.com/uc?id="
regex = "([\w-]){33}|([\w-]){19}"
for i, l in enumerate(links_to_download):
    url = l
    file_id = re.search(regex, url)[0]
    request = drive_service.files().get_media(fileId=file_id)
    fh = io.BytesIO()
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while done is False:
        status, done = downloader.next_chunk()
        print("Download %d%%." % int(status.progress() * 100))

However I now get:

googleapiclient.errors.HttpError: <HttpError 404 when requesting https://www.googleapis.com/drive/v3/files/fileId?alt=media returned "File not found: fileID.". Details: "[{'domain': 'global', 'reason': 'notFound', 'message': 'File not found: fileId.', 'locationType': 'parameter', 'location': 'fileId'}]">

Found a related question Any idea ?

jeandut
  • 2,471
  • 4
  • 29
  • 56
  • Does this answer your question? [Gdown is giving Permission error for particular file,although it is opening up fine manually](https://stackoverflow.com/questions/60739653/gdown-is-giving-permission-error-for-particular-file-although-it-is-opening-up-f) – Aurum Jul 06 '21 at 12:15
  • No it doesn't unfortunately – jeandut Jul 06 '21 at 12:18
  • Download link only works if you are authorized when you use it. It doesn't matter if its public or not. Try downloading though the api rather than using the download link – Linda Lawton - DaImTo Jul 06 '21 at 12:42
  • Can you expand @DaImTo ? The file is set to be downloadable by any viewer: "Viewers can download" is visible. – jeandut Jul 06 '21 at 12:44
  • @DaImTo you mean I need to be logged in a google account ? – jeandut Jul 06 '21 at 12:46
  • The Google drive download link will give you a link to download the file. but that does not mean you do not need to have access to download the file. This is not a sharable link it is a direct download link. If you dont have permissions on the file then your going to get a permission denied when you try to use it. – Linda Lawton - DaImTo Jul 06 '21 at 12:49
  • I think your best bet is using the Drive API for this; you can see the python quickstart [here](https://developers.google.com/drive/api/v3/quickstart/python) – Rafa Guillermo Jul 06 '21 at 13:50

1 Answers1

1

Ok thanks to the Google API, I was finally able to make it work !

The whole thing from getting the list of links inside the folder to downloading them was such a hassle I might write a blog post some day:

from google_auth_oauthlib.flow import Flow, InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload, MediaIoBaseDownload
from google.auth.transport.requests import Request
import io
import re
SCOPES = ['https://www.googleapis.com/auth/drive']
CLIENT_SECRET_FILE = "myjson.json"
authorized_port = 6006 # authorize URI redirect on the console
flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
cred = flow.run_local_server(port=authorized_port)
drive_service = build("drive", "v3", credentials=cred)
regex = "(?<=https://drive.google.com/file/d/)[a-zA-Z0-9]+"
for i, l in enumerate(links_to_download):
    url = l
    file_id = re.search(regex, url)[0]
    request = drive_service.files().get_media(fileId=file_id)
    fh = io.FileIO(f"file_{i}", mode='wb')
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while done is False:
        status, done = downloader.next_chunk()
        print("Download %d%%." % int(status.progress() * 100))
jeandut
  • 2,471
  • 4
  • 29
  • 56