3

I want to create a Python script to backup Google Drive files as a bit of fun / learning, but I am stuck. My script below did work, but it just made the last modified date and created date of all the files on my local drive on back up as the date they were backed up, and didn't preserve the original created date / modified date as they were on Google Drive.

Here is my script:

from __future__ import print_function
import sys, httplib2, os, datetime, io
from time import gmtime, strftime
from apiclient import discovery
import oauth2client
from oauth2client import client
from oauth2client import tools
from datetime import date

#########################################################################
# Fixing OSX el capitan bug ->AttributeError: 'Module_six_moves_urllib_parse' object has no attribute 'urlencode'
os.environ["PYTHONPATH"] = "/Library/Python/2.7/site-packages"
#########################################################################

CLIENT_SECRET_FILE = 'client_secrets.json'
TOKEN_FILE="drive_api_token.json"
SCOPES = 'https://www.googleapis.com/auth/drive'
APPLICATION_NAME = 'Drive File API - Python'
OUTPUT_DIR=str(date.today())+"_drive_backup"

try:
    import argparse
    flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
    flags = None

def get_credentials():
    home_dir = os.path.expanduser('~')
    credential_dir = os.path.join(home_dir, '.credentials')
    if not os.path.exists(credential_dir):
        os.makedirs(credential_dir)
    credential_path = os.path.join(credential_dir, TOKEN_FILE)
    store = oauth2client.file.Storage(credential_path)
    credentials = store.get()
    if not credentials or credentials.invalid:
        flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES)
        flow.user_agent = APPLICATION_NAME
        if flags:
            credentials = tools.run_flow(flow, store, flags)
        else: # Needed only for compatibility with Python 2.6
            credentials = tools.run(flow, store)
        print('Storing credentials to ' + credential_path)
    return credentials

def prepDest():
    if not os.path.exists(OUTPUT_DIR):
        os.makedirs(OUTPUT_DIR)
        return True
    return False

def downloadFile(file_name, file_id, file_createdDate, mimeType, service):
    request = service.files().get_media(fileId=file_id)
    if "application/vnd.google-apps" in mimeType:
        if "document" in mimeType:
            request = service.files().export_media(fileId=file_id, mimeType='application/vnd.openxmlformats-officedocument.wordprocessingml.document')
            file_name = file_name + ".docx"
        else: 
            request = service.files().export_media(fileId=file_id, mimeType='application/pdf')
            file_name = file_name + ".pdf"
    print("Downloading -- " + file_name)
    response = request.execute()
    with open(os.path.join(OUTPUT_DIR, file_name), "wb") as wer:
        wer.write(response)

def listFiles(service):
    def getPage(pageTok):
        return service.files().list(q="mimeType != 'application/vnd.google-apps.folder'",
               pageSize=1000, pageToken=pageTok, fields="nextPageToken,files(id,name, createdDate, mimeType)").execute()
    pT = ''; files=[]
    while pT is not None:
        results = getPage(pT)
        pT = results.get('nextPageToken')
        files = files + results.get('files', [])
    return files

def main():
        credentials = get_credentials()
        http = credentials.authorize(httplib2.Http())
        service = discovery.build('drive', 'v3', http=http)
        for item in listFiles(service):
            downloadFile(item.get('name'), item.get('id'), item.get('createdDate'), item.get('mimeType'), service)

if __name__ == '__main__':
    main()

To try and get the created date, you can see in the above script I added in createdDate, which looks like some of the metadata I can grab from the file: https://developers.google.com/drive/v2/reference/files

But I don't know if I am grabbing that metadata correctly, and if so, how I actually assign it to my downloaded file.

EDIT: Really sorry but I didn't specify an OS - this is for a mac.

Jimmy
  • 12,087
  • 28
  • 102
  • 192

1 Answers1

7

File v2 createdDate renamed in v3 to createdTime

The File reference you linked is for v2, but your code connects to the v3 service. When I ran your code, which uses createdDate from the v2 API, an error occurred (createdDate was an invalid metadata field).

I switched to the v3 File API, which lists the creation time as createdTime, and was able to retrieve the time without error.

File creation time changeable in Windows only

Linux/Unix does not allow setting a file's creation time, but it allows modification to the file's modified and access times via os.utime() (both times required by this function). The Drive API provides createdTime and modifiedTime but nothing for access time (which probably wouldn't make sense there), although the modification time could serve just as well for the access time.

In Windows, the file creation time could be set with win32file.SetFileTime.

Time conversion

Note that the times that are passed to the timestamp functions above are in seconds since epoch. The Drive API returns an ISO 8601 string that we convert to seconds with:

dt = datetime.datetime.strptime(dateTime, "%Y-%m-%dT%H:%M:%S.%fZ")
secs = int(dt.strftime("%s"))

Modifications

  1. Replace all instances of createdDate with createdTime.

  2. In listFiles() > getPage(), add modifiedTime to metadata fields:

    def listFiles(service):
        def getPage(pageTok):
            return service.files().list(q="mimeType != 'application/vnd.google-apps.folder'",
                                        pageSize=1000, pageToken=pageTok, fields="nextPageToken,files(id,name, createdTime, modifiedTime, mimeType)").execute()
    
  3. In main()'s for-loop, pass modifiedTime to downloadFiles():

    downloadFile(item.get('name'), item.get('id'), item.get('createdTime'), item.get('modifiedTime'), item.get('mimeType'), service)
    
  4. In downloadFiles(), add modifiedTime to parameter list after file_createdTime.

  5. Add these functions to set file timestamps:

    def dateToSeconds(dateTime):
        return int(datetime.datetime.strptime(dateTime, "%Y-%m-%dT%H:%M:%S.%fZ").strftime("%s"))
    
    def setFileTimestamps(fname, createdTime, modifiedTime):
        ctime = dateToSeconds(createdTime)
        mtime = dateToSeconds(modifiedTime)
        setFileCreationTime(fname, ctime)
        setFileModificationTime(fname, mtime)
    
    def setFileModificationTime(fname, newtime):
        # Set access time to same value as modified time,
        # since Drive API doesn't provide access time
        os.utime(fname, (newtime, newtime))
    
    def setFileCreationTime(fname, newtime):
        """http://stackoverflow.com/a/4996407/6277151"""
        if os.name != 'nt':
            # file creation time can only be changed in Windows
            return
    
        import pywintypes, win32file, win32con
    
        wintime = pywintypes.Time(newtime)
        winfile = win32file.CreateFile(
            fname, win32con.GENERIC_WRITE,
            win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE | win32con.FILE_SHARE_DELETE,
            None, win32con.OPEN_EXISTING,
            win32con.FILE_ATTRIBUTE_NORMAL, None)
    
        win32file.SetFileTime(winfile, wintime, None, None)
    
        winfile.close()
    
  6. In downloadFiles(), call setFileTimestamps() right after writing the file (as last line of function):

    def downloadFile(file_name, file_id, file_createdTime, modifiedTime, mimeType, service):
        request = service.files().get_media(fileId=file_id)
        if "application/vnd.google-apps" in mimeType:
            if "document" in mimeType:
                request = service.files().export_media(fileId=file_id, mimeType='application/vnd.openxmlformats-officedocument.wordprocessingml.document')
                file_name = file_name + ".docx"
            else:
                request = service.files().export_media(fileId=file_id, mimeType='application/pdf')
                file_name = file_name + ".pdf"
        print("Downloading -- " + file_name)
        response = request.execute()
        prepDest()
        fname = os.path.join(OUTPUT_DIR, file_name)
        with open(fname, "wb") as wer:
            wer.write(response)
    
        setFileTimestamps(fname, file_createdTime, modifiedTime)
    

GitHub repo

tony19
  • 125,647
  • 18
  • 229
  • 307
  • Thank you so much for this code, it seems to be just the ticket. I am really sorry but I realised I didn't specify an OS and I'm on a mac, but it looks like your code mentions windows. Is it quick to port to convert this to mac? – Jimmy Nov 27 '16 at 12:04
  • 1
    @Jimmy No problem. The code works in Linux/Unix (including macOS) or Windows. I added the Windows implementation in case you were using it, and the function gracefully exits if you aren't on Windows. Just be aware that the file-creation time can't be edited in macOS. (also note I'm on a mac myself, which is where I tested this code) – tony19 Nov 27 '16 at 12:06
  • Hi Tony. I'm going to mark this as solved, because you have put a lot of work in and it does seem to download some files and save them. The issue is, for whatever reason it only downloads 9 files and stops from one single folder. I'm not quite sure why. I've ran it a few times and it always seems to download the same 9 files and stops. Can you think of any reason for this? – Jimmy Nov 27 '16 at 12:39
  • Is it something to do with this? for item in ```listFiles(service)[:10]:```? – Jimmy Nov 27 '16 at 12:43
  • 1
    Ah, yes. In the GitHub repo, I had added a 10-item limit to avoid downloading a bunch of files (my Google Drive is pretty full). I've removed the limit. https://github.com/tony19-sandbox/google-drive-file-timestamps/commit/00dae21ae8b5196e0eeb0230c5140b9dbcd9db1d – tony19 Nov 27 '16 at 12:43