107

I am trying to download files from google drive and all I have is the drive's URL.

I have read about google API that talks about some drive_service and MedioIO, which also requires some credentials( mainly JSON file/OAuth). But I am unable to get any idea about how it is working.

Also, tried urllib2.urlretrieve, but my case is to get files from the drive. Tried wget too but no use.

Tried PyDrive library. It has good upload functions to drive but no download options.

Any help will be appreciated. Thanks.

Ahwar
  • 1,746
  • 16
  • 30
rkatkam
  • 2,634
  • 3
  • 18
  • 29

15 Answers15

127

If by "drive's url" you mean the shareable link of a file on Google Drive, then the following might help:

import sys
import requests


def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download&confirm=1"

    session = requests.Session()

    response = session.get(URL, params={"id": id}, stream=True)
    token = get_confirm_token(response)

    if token:
        params = {"id": id, "confirm": token}
        response = session.get(URL, params=params, stream=True)

    save_response_content(response, destination)


def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith("download_warning"):
            return value

    return None


def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk:  # filter out keep-alive new chunks
                f.write(chunk)


def main():
    if len(sys.argv) >= 3:
        file_id = sys.argv[1]
        destination = sys.argv[2]
    else:
        file_id = "TAKE_ID_FROM_SHAREABLE_LINK"
        destination = "DESTINATION_FILE_ON_YOUR_DISK"
    print(f"dowload {file_id} to {destination}")
    download_file_from_google_drive(file_id, destination)


if __name__ == "__main__":
    main()

The snipped does not use pydrive, nor the Google Drive SDK, though. It uses the requests module (which is, somehow, an alternative to urllib2).

When downloading large files from Google Drive, a single GET request is not sufficient. A second one is needed - see wget/curl large file from google drive.

scls
  • 16,591
  • 10
  • 44
  • 55
turdus-merula
  • 8,546
  • 8
  • 38
  • 50
  • 3
    This gives me a 404-not found, using the ID of a public shared file. Any suggestions what could be wrong? – user3722096 Mar 29 '18 at 17:40
  • How can I download files from Google Drive given links, for example https://drive.google.com/file/d/1I05c4-d9OsNwGZnLx85fR8dnX-yVoTWe/view – GoingMyWay Jun 25 '19 at 04:00
  • @turdus-merula Anyway to get the downloading file name as it is stored in drive? – yashas123 Aug 25 '19 at 13:16
  • NVM, I got it by doing this: re.search(r'filename\=\"(.*)\"', response.headers['Content-Disposition']).group(1) – yashas123 Aug 25 '19 at 13:25
  • 7
    Don't work and just silently downloads 4,0K file without warning or error Example link: https://drive.google.com/open?id=0B4qLcYyJmiz0TXdaTExNcW03ejA – mrgloom Feb 21 '20 at 16:47
  • How can we get the name of the document from the link – Om Rastogi Feb 18 '22 at 09:05
  • 5
    seems like something has changed behind the scenes and the token stuff does not quite work anymore. However, simply always including `confirm=1` as parameter seems to be a workaround. – Mr Tsjolder from codidact Apr 08 '22 at 14:11
  • 1
    This isn't working for me either. I can't see where to include "confirm=1". There doesn't seem to be an argument named 'confirm' to 'download_file_from_google_drive'. No matter what I try, I always get a big bunch of HTML rather than the small file I'm hoping to download. – CryptoFool Jul 15 '22 at 20:32
  • 1
    This works for me by including "confirm=1". I'm not sure why it doesn't work for other people. Note that I'm only downloading .csv files. – aysljc Oct 09 '22 at 13:50
  • Hi @CryptoFool, I had your same question and figured it out. Remove the code getting token and replace the variable `token` with `1` in this line: `params = { 'id' : id, 'confirm' : 1 }` – butterflyeffect Feb 17 '23 at 18:06
75

I recommend gdown package.

pip install gdown

Take your share link

https://drive.google.com/file/d/0B9P1L--7Wd2vNm9zMTJWOGxobkU/view?usp=sharing

and grab the id - eg. 1TLNdIufzwesDbyr_nVTR7Zrx9oRHLM_N by pressing the download button (look for at the link), and swap it in after the id below.

import gdown

url = 'https://drive.google.com/uc?id=0B9P1L--7Wd2vNm9zMTJWOGxobkU'
output = '20150428_collected_images.tgz'
gdown.download(url, output, quiet=False)
Vadim
  • 4,219
  • 1
  • 29
  • 44
  • 9
    Importantly, if you create the link by "Share" or "Get shareable link", the URL doesn't work - you must replace in the URL "open" to "uc". In other words, `drive.google.com/open?id= ...` to `drive.google.com/uc?id= ...` – Agile Bean May 16 '20 at 13:43
  • 6
    Bro, I don't have enough words to thank you. – Mudit Bhatia Apr 30 '21 at 15:34
  • 4
    The best and simplest answer. Thanks! – Aref May 01 '21 at 13:40
  • 2
    I tried to do as @AgileBean stated, but my link looks like this ```https://drive.google.com/file/d/3Xxk5lJSr...UV5eX9M/view?usp=sharing``` so it did not work. So instead, I used the ID parameter ```gdown --id 3Xxk5lJSr...UV5eX9M``` where ```3Xxk5lJSr...UV5eX9M``` is the file id that you can easily extract from the file's link. – A.Sherif Jun 08 '21 at 18:09
  • 1
    The best one. Thanks a lot!! – Subangkar KrS Aug 05 '21 at 14:26
  • 2
    it doesn't work.... even for public files. I find it ridiculous that the output from this, running on python is "you may be able to use the browser". Now I only need to download the library that converts Python to a human who knows how to operate a browser and has hands for keyboard and mouse.... – JasonGenX Mar 03 '22 at 22:23
  • 1
    Worked for me using when pasted the link that appears after pressing the "Download" button on google drive web page – zetyquickly Jun 02 '22 at 21:34
59

Having had similar needs many times, I made an extra simple class GoogleDriveDownloader starting on the snippet from @user115202 above. You can find the source code here.

You can also install it through pip:

pip install googledrivedownloader

Then usage is as simple as:

from google_drive_downloader import GoogleDriveDownloader as gdd

gdd.download_file_from_google_drive(file_id='1iytA1n2z4go3uVCwE__vIKouTKyIDjEq',
                                    dest_path='./data/mnist.zip',
                                    unzip=True)

This snippet will download an archive shared in Google Drive. In this case 1iytA1n2z4go3uVCwE__vIKouTKyIDjEq is the id of the sharable link got from Google Drive.

ndrplz
  • 1,584
  • 12
  • 16
  • 3
    can't retrieve file ... `'open(/content/data.json').read()` returns `'\n\nNot Found\n\n\n

    Not Found

    \n

    Error 404

    \n\n\n'`
    – Raksha Apr 19 '19 at 17:25
  • @Raksha It's difficult to understand the issue from your comment. If you still encounter this problem, please open a proper issue on [GitHub](https://github.com/ndrplz/google-drive-downloader/issues) – ndrplz Apr 23 '19 at 08:34
  • 1
    What modification should be done to download this zip file: https://drive.google.com/open?id=0B4qLcYyJmiz0TXdaTExNcW03ejA Just using 0B4qLcYyJmiz0TXdaTExNcW03ejA not work. – mrgloom Feb 21 '20 at 16:53
  • You need to add `requests` to the requirements. – Wok Dec 15 '20 at 08:45
  • 3
    What if I want to access a restricted file using a Gmail id and password? – Dhiraj Gandhi Jan 11 '21 at 20:08
  • My files are in a folder and the shared link of the folder is `https://drive.google.com/drive/folders/14gKg6QW3TnwnaHoYTxxTr6NzgQWqJufa?usp=sharing`, but I can not download this folder using this method. – Jingnan Jia Jul 26 '21 at 21:31
  • I had `GoogleDriveDownloader` working just fine for me, but now it appears to have stopped. I'm not sure if something changed on Google's end: the response contains 200 OK but does not contain any cookies, so the attempt to extract the token breaks and it can't proceed past that. – Sebastian Apr 04 '23 at 18:35
  • I removed the first call in the download method about retrieving the token, and now everything seems to be working fine for me. – Sebastian Apr 04 '23 at 18:58
11

Here's an easy way to do it with no third-party libraries and a service account.

pip install google-api-core and google-api-python-client

from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from google.oauth2 import service_account
import io

credz = {} #put json credentials her from service account or the like
# More info: https://cloud.google.com/docs/authentication

credentials = service_account.Credentials.from_service_account_info(credz)
drive_service = build('drive', 'v3', credentials=credentials)

file_id = '0BwwA4oUTeiV1UVNwOHItT0xfa2M'
request = drive_service.files().get_media(fileId=file_id)
#fh = io.BytesIO() # this can be used to keep in memory
fh = io.FileIO('file.tar.gz', 'wb') # this can be used to write to disk
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
    status, done = downloader.next_chunk()
    print("Download %d%%." % int(status.progress() * 100))


RayB
  • 2,096
  • 3
  • 24
  • 42
10

PyDrive allows you to download a file with the function GetContentFile(). You can find the function's documentation here.

See example below:

# Initialize GoogleDriveFile instance with file id.
file_obj = drive.CreateFile({'id': '<your file ID here>'})
file_obj.GetContentFile('cats.png') # Download file as 'cats.png'.

This code assumes that you have an authenticated drive object, the docs on this can be found here and here.

In the general case this is done like so:

from pydrive.auth import GoogleAuth

gauth = GoogleAuth()
# Create local webserver which automatically handles authentication.
gauth.LocalWebserverAuth()

# Create GoogleDrive instance with authenticated GoogleAuth instance.
drive = GoogleDrive(gauth)

Info on silent authentication on a server can be found here and involves writing a settings.yaml (example: here) in which you save the authentication details.

Random Nerd
  • 134
  • 1
  • 9
Robin Nabel
  • 2,170
  • 1
  • 21
  • 26
6

There's in the docs a function that downloads a file when we provide an ID of the file to download,

from __future__ import print_function

import io

import google.auth
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseDownload


def download_file(real_file_id):
    """Downloads a file
    Args:
        real_file_id: ID of the file to download
    Returns : IO object with location.

    Load pre-authorized user credentials from the environment.
    TODO(developer) - See https://developers.google.com/identity
    for guides on implementing OAuth2 for the application.
    """
    creds, _ = google.auth.default()

    try:
        # create drive api client
        service = build('drive', 'v3', credentials=creds)

        file_id = real_file_id

        # pylint: disable=maybe-no-member
        request = service.files().get_media(fileId=file_id)
        file = io.BytesIO()
        downloader = MediaIoBaseDownload(file, request)
        done = False
        while done is False:
            status, done = downloader.next_chunk()
            print(F'Download {int(status.progress() * 100)}.')

    except HttpError as error:
        print(F'An error occurred: {error}')
        file = None

    return file.getvalue()


if __name__ == '__main__':
    download_file(real_file_id='1KuPmvGq8yoYgbfW74OENMCB5H0n_2Jm9')

This bears the question:

How do we get the file ID to download the file?

Generally speaking, a URL from a shared file from Google Drive looks like this

https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing

where 1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh corresponds to fileID.

You can simply copy it from the URL or, if you prefer, it's also possible to create a function to get the fileID from the URL.

For instance, given the following url = https://drive.google.com/file/d/1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh/view?usp=sharing,

def url_to_id(url):
    x = url.split("/")
    return x[5]

Printing x will give

['https:', '', 'drive.google.com', 'file', 'd', '1HV6vf8pB-EYnjcJcH65eGZVMa2v2tcMh', 'view?usp=sharing']

And so, as we want to return the 6th array value, we use x[5].

Tiago Martins Peres
  • 14,289
  • 18
  • 86
  • 145
4
import requests

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id , 'confirm': 1 }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

if __name__ == "__main__":
    file_id = 'TAKE ID FROM SHAREABLE LINK'
    destination = 'DESTINATION FILE ON YOUR DISK'
    download_file_from_google_drive(file_id, destination)

Just repeating the accepted answer but adding confirm=1 parameter so it always downloads even if the file is too big

Lin Chi Yu
  • 115
  • 2
  • 7
3

This has also been described above,

   from pydrive.auth import GoogleAuth
   gauth = GoogleAuth()
   gauth.LocalWebserverAuth()
   drive = GoogleDrive(gauth)

This creates its own server too do the dirty work of authenticating

   file_obj = drive.CreateFile({'id': '<Put the file ID here>'})
   file_obj.GetContentFile('Demo.txt') 

This downloads the file

Aidan L
  • 31
  • 2
1
# Importing [PyDrive][1] OAuth
from pydrive.auth import GoogleAuth

def download_tracking_file_by_id(file_id, download_dir):
    gauth = GoogleAuth(settings_file='../settings.yaml')
    # Try to load saved client credentials
    gauth.LoadCredentialsFile("../credentials.json")
    if gauth.credentials is None:
        # Authenticate if they're not there
        gauth.LocalWebserverAuth()
    elif gauth.access_token_expired:
        # Refresh them if expired
        gauth.Refresh()
    else:
        # Initialize the saved creds
        gauth.Authorize()
    # Save the current credentials to a file
    gauth.SaveCredentialsFile("../credentials.json")

    drive = GoogleDrive(gauth)

    logger.debug("Trying to download file_id " + str(file_id))
    file6 = drive.CreateFile({'id': file_id})
    file6.GetContentFile(download_dir+'mapmob.zip')
    zipfile.ZipFile(download_dir + 'test.zip').extractall(UNZIP_DIR)
    tracking_data_location = download_dir + 'test.json'
    return tracking_data_location

The above function downloads the file given the file_id to a specified downloads folder. Now the question remains, how to get the file_id? Simply split the url by id= to get the file_id.

file_id = url.split("id=")[1]
Random Nerd
  • 134
  • 1
  • 9
Shivendra
  • 1,542
  • 2
  • 22
  • 35
1

I tried using google Colaboratory: https://colab.research.google.com/

Suppose your sharable link is https://docs.google.com/spreadsheets/d/12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu/edit?usp=sharing&ouid=102608702203033509854&rtpof=true&sd=true

all you need is id that is 12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu

command in cell

!gdown 12hiI0NK7M0KEfscMfyBaLT9gxcZMleeu

run the cell and you will see that file is downloaded in /content/Amazon_Reviews.xlsx

Note: one should know how to use Google colab

Pavn
  • 160
  • 1
  • 10
0

This example is based on an similar to RayB, but keeps the file in memory and is a little simpler, and you can paste it into colab and it works.

import googleapiclient.discovery
import oauth2client.client
from google.colab import auth
auth.authenticate_user()

def download_gdrive(id):
  creds = oauth2client.client.GoogleCredentials.get_application_default()
  service = googleapiclient.discovery.build('drive', 'v3', credentials=creds)
  return service.files().get_media(fileId=id).execute()

a = download_gdrive("1F-yaQB8fdsfsdafm2l8WFjhEiYSHZrCcr")
Att Righ
  • 1,439
  • 1
  • 16
  • 29
0

I used the accepted solution for a long period, but now google has changed the download warning response so it does not work anymore.

I am now using the API as it is a safer way to ensure it won't suddenly stop, but I could also make it work parsing the response HTML looking for the download url as following:

import requests
from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.action = None

    def handle_starttag(self, tag, attrs):
        if tag == "form":
            for name, value in attrs:
                if name == "id" and value == "download-form":
                    for name, value in attrs:
                        if name == "action":
                            self.action = value

DOWNLOAD_URL = 'https://docs.google.com/uc?export=download'
session = requests.Session()
response = session.get(file_url, params={'id': id}, stream=True)

content_type = response.headers['content-type']
if content_type == 'text/html; charset=utf-8':
    parser = MyHTMLParser()
    parser.feed(response.text)
    download_url = parser.action
    response = session.post(download_url, stream=True)
    
file = response.content
rickh
  • 81
  • 2
  • 8
0

Fixed version for 2023 + generator for tracking progress

import requests


def download_file_from_google_drive(file_id, destination, chunk_size=32768):
    url = "https://docs.google.com/uc?export=download"

    session = requests.Session()
    params = {'id': file_id, 'confirm': 1}
    response = session.get(url, params=params, stream=True)

    for i, chunk_size_ in save_response_content(response, destination, chunk_size):
        yield i, chunk_size_


def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None


def save_response_content(response, destination, chunk_size):
    with open(destination, "wb") as f:
        for i, chunk in enumerate(response.iter_content(chunk_size)):
            if chunk:  # filter out keep-alive new chunks
                f.write(chunk)
                yield i, chunk_size


if __name__ == '__main__':
    file_id = '...'
    destination = '...'
    for i, chunk_size in download_file_from_google_drive(file_id, destination):
        print(i, chunk_size)
Jedi Knight
  • 367
  • 2
  • 10
0

For those who are interested in the link to download via HTTP, the Google API and most clients provide a webContentLink field containing it (note the file permissions to use it)

-3

You can install https://pypi.org/project/googleDriveFileDownloader/

pip install googleDriveFileDownloader

And download the file, here is the sample code to download

from googleDriveFileDownloader import googleDriveFileDownloader
a = googleDriveFileDownloader()
a.downloadFile("https://drive.google.com/uc?id=1O4x8rwGJAh8gRo8sjm0kuKFf6vCEm93G&export=download")
Sundeep Pidugu
  • 2,377
  • 2
  • 21
  • 43
  • `bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?` – Nouman Apr 10 '20 at 16:57