0

So, it's an old thread I know, but all the threads I found are older than 7 years, so it seemed like a good idea to reopen one.

My problem is that I have to delete the duplicate files in my drive because they are taking up space unnecessarily, I found this code from this old thread - How to Find duplicates files -, however it doesn't work, I wanted to ask, can you find the error? Or if you have some other solution to recommend to delete duplicates (no Clean Drive as it is paid)

Thank you

function startProcess() {
    PropertiesService.getScriptProperties().deleteAllProperties();
    try {
        ScriptApp.deleteTrigger(ScriptApp.getProjectTriggers()[0]);
    } catch (e) {}
    var sh = SpreadsheetApp.getActiveSheet();
    sh.getDataRange().clear();
    sh.getRange(1, 1, 1, 4).setValues([
        ['fileName (logged @' + Utilities.formatDate(new Date(), Session.getScriptTimeZone(), 'MMM-dd-yyyy HH:mm') + ')', 'fileSize', 'parent folders tree', 'fileID']
    ]);
    var trig = ScriptApp.newTrigger('getDriveFilesList_').timeBased().everyMinutes(5).create();
    Logger.log(trig.getUniqueId() + '  ' + trig.getHandlerFunction());
    getDriveFilesList_();
}

function getDriveFilesList_() {
    var content = [];
    var startTime = new Date().getTime();
    var sh = SpreadsheetApp.getActiveSheet();
    if (!PropertiesService.getScriptProperties().getProperty('numberOfFiles')) {
        PropertiesService.getScriptProperties().setProperty('numberOfFiles', 0);
    }

    var numberOfFiles = Number(PropertiesService.getScriptProperties().getProperty('numberOfFiles'));
    Logger.log(numberOfFiles);
    var max = numberOfFiles + 10000;
    if (!PropertiesService.getScriptProperties().getProperty('continuationToken')) {
        var files = DriveApp.getFiles();
        // var files = DriveApp.getFolderById('0B3qSFd_____MTFZMDQ').getFiles();// use this line and comment the above if you want to process a single folder
        // use your chozen folder ID of course (available from the browser url , the part after "https://drive.google.com/?authuser=0#folders/")
    } else {
        var files = DriveApp.continueFileIterator(PropertiesService.getScriptProperties().getProperty('continuationToken'))
    }
    while (files.hasNext() && numberOfFiles < (max)) {
        var file = files.next()
        if (file.getSize() > 0) {
            numberOfFiles++;
            var folder = '(shared)';
            if (file.getParents().hasNext()) {
                folder = getTree_(file)
            }
            content.push([file.getName(), file.getSize(), folder, file.getId()])
        }
        if (new Date().getTime() - startTime > 250000) {
            break
        };
    }
    sh.getRange(sh.getLastRow() + 1, 1, content.length, content[0].length).setValues(content);
    if (!files.hasNext()) {
        ScriptApp.deleteTrigger(ScriptApp.getProjectTriggers()[0]);
        Logger.log('done !');
        sh.getRange(sh.getLastRow() + 1, 1).setValue('All files processed (' + numberOfFiles + ' found)')
    };
    var continuationToken = files.getContinuationToken()
    PropertiesService.getScriptProperties().setProperty('numberOfFiles', numberOfFiles);
    PropertiesService.getScriptProperties().setProperty('continuationToken', continuationToken);
}

function markDuplicates() {
    handleDuplicates_(false)
}

function trashDuplicates() {
    handleDuplicates_(true)
}

function handleDuplicates_(trash) {
    var sh = SpreadsheetApp.getActiveSheet();
    sh.setFrozenRows(1);
    sh.sort(1);
    var data = sh.getDataRange().getValues()
    var headers = data.shift()
    var lastComment = data.pop();
    var toDelete = [];
    var item = data[0];
    for (var n = 1; n < data.length; n++) {
        if (data[n][0] == item[0] && data[n][1] == item[1]) {
            toDelete.push('delete ' + n);
        }
        item = data[n];
    }
    var marker = sh.getRange(2, 1, data.length, 1).getBackgrounds();
    for (var n in data) {
        if (!trash) {
            marker.push(['#FFF'])
        };
        if (toDelete.indexOf('delete ' + n) > -1 && !trash) {
            marker[n][0] = '#F99';
        }
        if (toDelete.indexOf('delete ' + n) > -1 && trash) {
            if (marker[n][0] == '#ff9999') {
                try {
                    DriveApp.getFileById(data[n][3]).setTrashed(trash);
                    marker[n][0] = '#F33';
                } catch (err) {
                    Logger.log(err)
                }
            } else {
                marker[n][0] = '#FF9';
            }
        }
    }
    sh.getRange(2, 1, marker.length, 1).setBackgrounds(marker);
}

function getTree_(file) {
    var tree = [];
    var folderP = file.getParents()
    while (folderP.hasNext()) {
        var folder = folderP.next();
        folderP = folder.getParents();
        tree.push(folder.getName());
    }
    return tree.reverse().join('/');
}
Linda Lawton - DaImTo
  • 106,405
  • 32
  • 180
  • 449
user16205485
  • 1
  • 1
  • 2
  • I have to apologize for my poor English skill. Unfortunately, I cannot understand your question. Can I ask you about the detail of `however it doesn't work`? And, I cannot understand `delete the duplicate files in my drive`. Can I ask you about the detail of your goal? – Tanaike Jan 25 '22 at 00:34
  • 1
    please define doesn't work what's not working. – Linda Lawton - DaImTo Jan 25 '22 at 10:27
  • Sorry for my english, i used google translator, i'll try to explain me, i have found this code to find and delete duplicated files in my Google Drive folders, because i need to free up space i want to delete all of the copies. To do that i have found that code to do that job for me, but when i try to run it it doesn't work, i receive `TypeError: Cannot read property 'getDataRange' of null startProcess @ Codice.gs:7` so, because i don't know this language i'm asking help @DaImTo @Tanaike – user16205485 Jan 25 '22 at 13:37

1 Answers1

1

Since at the end of your question you ask about other free solutions I'd like to point you at a simple mobile app that I recently made because I had the exact same need. It's called DeDuplicate. It supports Google Drive, Dropbox, OneDrive and more.

I'd like to further expand my answer by giving some advice.

It seems like your code makes use of "Google Apps Script", which I didn't even know about. I used a different approach, calling the Google Drive API directly.

These pages from the documentation proved to be extremely helpful:

  • https://developers.google.com/workspace/guides/auth-overview for authentication. I basically had to create a new project and app on Google Cloud, and configure it properly, with the needed authorization scopes, especially https://www.googleapis.com/auth/drive of course. Then you can login to your account with OAuth (in my mobile app I use the expo-auth-session library to do that) and get an access token.

    Tip: I found out that you need to add an extra parameter (access_type: 'offline') to the auth request in order to get the refresh token in the response. It wasn't obvious to me.

  • https://developers.google.com/drive/api/v3/reference that is the actual API of Google Drive. You can even try calls directly, it's very handy.

Provided with your valid access token, you can send simple fetch requests to the API, without resorting to any specific library. You just have to read the documentation and (if you're like me) do a lot of trial and error, observing the server responses. It depends on what you want to do.

Here's an adapted example from my actual code. This is a request that gets a list of all the files on your drive (actually, the first "page"). The documentation that explains this method is here: https://developers.google.com/drive/api/v3/reference/files/list

import queryString from 'query-string';

async function sendRequest() {
    const token = await getToken(); // we get the token somehow

    const headers: HeadersInit = {
        'Authorization': `Bearer ${token}`,
        'Accept': `application/json`,
    };

    const fields = 'files(id,name,mimeType,parents,webViewLink,createdTime,modifiedTime,md5Checksum,size,hasThumbnail,thumbnailLink,imageMediaMetadata,videoMediaMetadata),nextPageToken';

    const response = await fetch(queryString.stringifyUrl({
        url: 'https://www.googleapis.com/drive/v3/files',
        query: {
            corpora: 'user',
            fields,
            q: [
                `'me' in owners`,
                `trashed = false`,
            ].join(' and '),
            // pageSize: 200
        }
    }), {
        method: 'GET',
        headers
    });

    const output = await response.json();
    return output;
}

The md5Checksum field is important for identifying duplicates.