How to download only files that are not in your computer from Google Drive using python

Question

I'm trying to download files from my google drive folder using Python. I have tried the script below from How to download specific Google Drive folder using Python? and it works for me but I have lot of files in my google drive folder and would like to skip the files which already downloaded in my computer. Can I know is there anyway I can make it possible?

from __future__ import print_function
import pickle
import os
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from oauth2client import client
from oauth2client import tools
from oauth2client.file import Storage
from apiclient.http import MediaFileUpload, MediaIoBaseDownload
import io
from apiclient import errors
from apiclient import http
import logging

from apiclient import discovery

# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive']


# To list folders
def listfolders(service, filid, des):
    results = service.files().list(
        pageSize=1000, q="\'" + filid + "\'" + " in parents",
        fields="nextPageToken, files(id, name, mimeType)").execute()
    # logging.debug(folder)
    folder = results.get('files', [])
    for item in folder:
        if str(item['mimeType']) == str('application/vnd.google-apps.folder'):
            if not os.path.isdir(des+"/"+item['name']):
                os.mkdir(path=des+"/"+item['name'])
            print(item['name'])
            listfolders(service, item['id'], des+"/"+item['name'])  # LOOP un-till the files are found
        else:
            downloadfiles(service, item['id'], item['name'], des)
            print(item['name'])
    return folder


# To Download Files
def downloadfiles(service, dowid, name,dfilespath):
    request = service.files().get_media(fileId=dowid)
    fh = io.BytesIO()
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while done is False:
        status, done = downloader.next_chunk()
        print("Download %d%%." % int(status.progress() * 100))
    with io.open(dfilespath + "/" + name, 'wb') as f:
        fh.seek(0)
        f.write(fh.read())


def main():
    """Shows basic usage of the Drive v3 API.
    Prints the names and ids of the first 10 files the user has access to.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)  # credentials.json download from drive API
            creds = flow.run_local_server()
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('drive', 'v3', credentials=creds)
    # Call the Drive v3 API

    Folder_id = "'PAST YOUR SHARED FOLDER ID'"  # Enter The Downloadable folder ID From Shared Link

    results = service.files().list(
        pageSize=1000, q=Folder_id+" in parents", fields="nextPageToken, files(id, name, mimeType)").execute()
    items = results.get('files', [])
    if not items:
        print('No files found.')
    else:
        print('Files:')
        for item in items:
            if item['mimeType'] == 'application/vnd.google-apps.folder':
                if not os.path.isdir("Folder"):
                    os.mkdir("Folder")
                bfolderpath = os.getcwd()+"/Folder/"
                if not os.path.isdir(bfolderpath+item['name']):
                    os.mkdir(bfolderpath+item['name'])

                folderpath = bfolderpath+item['name']
                listfolders(service, item['id'], folderpath)
            else:
                if not os.path.isdir("Folder"):
                    os.mkdir("Folder")
                bfolderpath = os.getcwd()+"/Folder/"
                if not os.path.isdir(bfolderpath + item['name']):
                    os.mkdir(bfolderpath + item['name'])

                filepath = bfolderpath + item['name']
                downloadfiles(service, item['id'], item['name'], filepath)


if __name__ == '__main__':
    main()

Check your hard drive to see if you have it already if you dont download it? How else would you go about that? — Linda Lawton - DaImTo, Apr 15 '20 at 10:49
Usually, a good approach is to divide the logic into two. Initial "snapshot" of your drive - Where you sync most, if not all, files after you scanned for missing ones. The second is "delta changes" - Where you subscribe for changes and only apply those without the need to keep scanning. You delegate the job of "tell me what files I need" to another process. Luckily, the [drive API](https://developers.google.com/drive/api/v3/reference/files/watch) has just something similar. Note that "files" refers to both regular files and directories, in most cases. For local "watch", there's `inotify`. — edd, Apr 15 '20 at 10:51
How do you know recognize the already downloaded files? By their name? In this case, you need to pass the names of all the already downloaded files into an array, and when listing the files on your drive - compare the name of each one against the names in the array in order to decide if it has been downloaded already. Pretty complicated and probably not worth it. — ziganotschka, Apr 15 '20 at 11:49
then what if i would like to download the latest files which uploaded to my shared drive? need to find the createdTime in the google drive is it? — BBBBBBBB, Apr 16 '20 at 03:19

BBBBBBBB · Accepted Answer · 2020-04-16T06:54:02.107

I have tried modified the parts below, and its works! thank you all for stopping by~

def main():
    """Shows basic usage of the Drive v3 API.
    Prints the names and ids of the first 10 files the user has access to.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)  # credentials.json download from drive API
            creds = flow.run_local_server()
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('drive', 'v3', credentials=creds)
    # Call the Drive v3 API

    Folder_id = "'PAST YOUR SHARED FOLDER ID'"  # Enter The Downloadable folder ID From Shared Link
    File_Download_Path = "'LOCATION OF THE FILES DOWNLOADED'" 

    results = service.files().list(
        pageSize=1000, q=Folder_id+" in parents", fields="nextPageToken, files(id, name, mimeType)").execute()
    items = results.get('files', [])
    if not items:
        print('No files found.')
    else:
        print('Files:')
        for item in items:
                if not os.path.isdir(File_Download_Path):
                    os.mkdir(File_Download_Path)
                if not item['name'] in os.listdir(File_Download_Path):
                    downloadfiles(service, item['id'], item['name'], File_Download_Path)
        print("All files are downloaded.")


if __name__ == '__main__':
    main()

score -1 · Answer 2 · answered Apr 25 '22 at 16:00

I tired the above menthod, if there is any folders are present inside it is not downloading and getting the below error, raise HttpError(resp, content, uri=self._uri) googleapiclient.errors.HttpError: <HttpError 403 when requesting https://www.googleapis.com/drive/v3/files/1x7gHhVicv_fSI4t76Rh9pK6VOaBDRC6U?alt=media returned "Only files with binary content can be downloaded. Use Export with Docs Editors files.". Details: "[{'domain': 'global', 'reason': 'fileNotDownloadable', 'message': 'Only files with binary content can be downloaded. Use Export with Docs Editors files.', 'locationType': 'parameter', 'location': 'alt'}]">

Note: It downloads only files are there not folders

This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/31620751) — BrokenBenchmark, Apr 30 '22 at 23:35

How to download only files that are not in your computer from Google Drive using python

2 Answers2