4

Using the basic tutorial code from Google works fine for removing duplicates in GoogleSheets. However, since the maximum execution time is 6minutes, I'm running into timedout issues with larger spreadsheets that have thousands of rows. How would I be able to modify this code to work with larger spreadsheets? For example, how would I be able to set a maximum amount of rows to iterate, starting from the bottom?

This is the code:

function removeDuplicates() {
  var sheet = SpreadsheetApp.getActiveSheet();
  var data = sheet.getDataRange().getValues();
  var newData = [];
  for (var i in data) {
    var row = data[i];
    var duplicate = false;
    for (var j in newData) {
      if (row.join() == newData[j].join()) {
        duplicate = true;
      }
    }
    if (!duplicate) {
      newData.push(row);
    }
  }
  sheet.clearContents();
  sheet.getRange(1, 1, newData.length, newData[0].length).setValues(newData);
}
Michael Hawk
  • 65
  • 1
  • 3
  • Does this answer your question? [How to remove duplicates from a two-dimensional array?](https://stackoverflow.com/questions/20339466/how-to-remove-duplicates-from-a-two-dimensional-array) – CMB Apr 21 '21 at 21:52
  • Although I'm not sure whether I could correctly understand about your situation, I proposed an answer. Could you please confirm it? If that was not the result you expect, I apologize. – Tanaike Apr 22 '21 at 01:51

3 Answers3

3

I believe your goal as follows.

  • You want to delete the duplicated rows.
  • You want to achieve this using Google Apps Script.
  • You want to reduce the process cost of the script.

In this case, I thought that removeDuplicates() might be used for achieving your goal. The sample script is as follows.

Sample script:

function removeDuplicates() {
  SpreadsheetApp.getActiveSheet().getDataRange().removeDuplicates();
}

Or, in this case, Sheets API can be also used as follows. When above script was not useful for your situation, please test the following script. When Sheets API is used, the process cost can be reduced a little. Ref Before you use this, please enable Sheets API at Advanced Gooele services.

function removeDuplicates2() {
  const ss = SpreadsheetApp.getActiveSpreadsheet();
  const sheet = ss.getActiveSheet();
  const resource = { requests: [{ deleteDuplicates: { range: { sheetId: sheet.getSheetId() } } }] };
  Sheets.Spreadsheets.batchUpdate(resource, ss.getId());
}

References:

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • This works well! Thank you. Do you know how i'd be able to make it only compare a certain column (in this case, G) and remove the rows if the columns are dupes? For example, this would work like the builtin Remove Duplicates under Data, if you select header row and select one column. – Michael Hawk Apr 22 '21 at 03:39
  • @Michael Hawk Thank you for replying. I'm glad your issue was resolved. About your new question of `Do you know how i'd be able to make it only compare a certain column (in this case, G) and remove the rows if the columns are dupes? For example, this would work like the builtin Remove Duplicates under Data, if you select header row and select one column.`, when the pattern 1 is used, how about `SpreadsheetApp.getActiveSheet().getDataRange().removeDuplicates([7])`? If I misunderstood your new question, I apologize. – Tanaike Apr 22 '21 at 04:20
  • This is perfect! Thank you so very much. I have one last question if you don't mind. How would you get it to remove dupes for every sheet and not just the current active sheet? – Michael Hawk Apr 22 '21 at 05:59
  • @Michael Hawk Thank you for replying. I'm glad your 2 issues were resolved. I would like to support you. But the issue of replying is new issue, and that is different from your question. So can you post it as new question? Because when your initial question is changed by comment, other users who see your question are confused. By posting it as new question, users including me can think of it. If you can cooperate to resolve your new issue, I'm glad. Can you cooperate to resolve your new question? – Tanaike Apr 22 '21 at 06:02
  • Of course! Thanks. https://stackoverflow.com/questions/67207896/google-apps-script-remove-duplicates-in-all-sheets-with-one-function – Michael Hawk Apr 22 '21 at 06:30
  • @Michael Hawk Thank you for your response. I could confirm it and posted an answer. Could you please confirm it? If that was not the result you expect, I apologize. – Tanaike Apr 22 '21 at 06:45
0

If you want an alternative to removeDuplicates(), and have an array variable to use as you need, I've found this function useful:

 function removeDuplicates() {
  const sheet = SpreadsheetApp.getActiveSheet(),
        data = sheet.getDataRange().getValues();

  let newValue = data.map(JSON.stringify),
      newData = [...new Set(newValue)],
      unique = Array.from(newData, JSON.parse);

   sheet.getRange(1, 1, unique.length, unique[0].length).setValues(unique);
 }
  • To handle cells with date format you could use a reviver: ```unique = Array.from(newData, x => JSON.parse(x, dateTimeReviver)); // Based on https://stackoverflow.com/a/14509447/1027723 const dateTimeReviver = (key, value) => { if (typeof value === 'string') { const a = /^(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z)$/.exec(value); if (a) return new Date(a[1]); } return value; }``` – mhawksey May 24 '23 at 22:25
  • I simplified the code, but I don't know if it's a good best practice to put the function inside the code. The regex also suffices like this: /(\d{4}-\d{2}-\d{2}).*/ : dates = (k, value)=> { if (typeof value === 'string') { var a = /(\d{4}-\d{2}-\d{2}).*/.exec(value); if (a) return new Date(a[0]) } return value; } unique = Array.from(newData, v => JSON.parse(v, dates)); sDest.getRange(1, 1, unique.length, unique[0].length).setValues(unique); – Andrea Guerri May 30 '23 at 13:00
0

@Ze Za just do it like this:

function removeDuplicates1() {
    const sheet = SpreadsheetApp.getActiveSheet();
    sheet.getDataRange().removeDuplicates([3]);
}

I don't understand then your goal with row 4 and 5 from your code