4

I am using the below script to delete duplicate rows from the google spreadsheet. The script was working good but as the data in the spreadsheet is being added daily, now the script is throwing "Exceeded maximum execution time" error. As I am new to scripting I don't understand what is my problem.

Could someone help me in solving this problem of mine.

    function Deleteduplicates() {
  var SpreadSheetKey = "My key";
  var sheetD = SpreadsheetApp.openById(SpreadSheetKey).getSheetByName("Daily");
  var sheetW = SpreadsheetApp.openById(SpreadSheetKey).getSheetByName("Weekly");
  var dataD = sheetD.getDataRange().getValues();
  var dataW = sheetW.getDataRange().getValues();
  //Daily
  var newDataD = new Array();
  for(i in dataD){
    var row = dataD[i];
    var duplicate = false;
    for(j in newDataD){
      if(row.join() == newDataD[j].join()){
        duplicate = true;
      }
    }
    if(!duplicate){
      newDataD.push(row);
    }
  }
  //weekly
  var newDataW = new Array();
  for(i in dataW){
    var row = dataW[i];
    var duplicate = false;
    for(j in newDataW){
      if(row.join() == newDataW[j].join()){
        duplicate = true;
      }
    }
    if(!duplicate){
      newDataW.push(row);
    }
  }
  sheetD.clearContents();
  sheetW.clearContents();
  sheetD.getRange(1, 1, newDataD.length, newDataD[0].length).setValues(newDataD);
  sheetW.getRange(1, 1, newDataW.length, newDataW[0].length).setValues(newDataW);
}
Ashwaq
  • 431
  • 7
  • 17
  • From the VIEW, EXECUTION TRANSCRIPT menu, at the bottom of the list, there is a time given for how long it takes the script to run. How long does it take? – Alan Wells May 07 '15 at 14:28
  • Execution failed: Exceeded maximum execution time [363.923 seconds total runtime] – Ashwaq May 07 '15 at 14:29
  • Maybe a post like this can help: [Exceeded Max Exec Time](http://stackoverflow.com/questions/7854573/exceeded-maximum-execution-time-in-google-apps-script?rq=1) – Alan Wells May 07 '15 at 14:31

2 Answers2

3

Conceptually, this should be quite a bit faster. I have not tried it on a large data set. The first version will leave the rows sorted as they were originally. The second version will be faster but will leave the rows sorted according to the columns from first to last on first text.

function Deleteduplicates() {
  var SpreadSheetKey = "My key";
  var ss = SpreadsheetApp.openById(SpreadSheetKey);
  var sheetD = ss.getSheetByName("Daily");
  var sheetW = ss.getSheetByName("Weekly");
  var sheets = [sheetD, sheetW];
  var toSs = {};
  for(s in sheets) {
    var data = sheets[s].getDataRange().getValues();
    for(i in data){
      // EDIT: remove commas from join("") for blank test
      data[i].unshift(data[i].join(""),(1000000 + i).toString());
      }
    data.sort();
    // remove blank rows -- Edit
    var blank = 0;
    while(data[blank][0].trim().length == 0) {blank++};
    if(blank > 0) data.splice(0, blank);
    // end Edit
    var len = data.length - 1;
    for(var x = len; x > 0; x-- ) {
      if(data[x][0] == data[x-1][0]) {
        data.splice(x, 1);
        };
      };
    for(i in data) {
      data[i].splice( 0, 1);
      };
    data.sort();
    for(i in data) {
      data[i].splice(0, 1);
      };
    toSs[sheets[s].getSheetName()] = data;
  };
  for(s in sheets) {
    var data = toSs[sheets[s].getSheetName()];
    sheets[s].clearContents();
    sheets[s].getRange(1, 1, data.length, data[0].length).setValues(data);
  }
}

Faster leaving rows sorted by join() created to test for duplicates

function Deleteduplicates() {
  var SpreadSheetKey = "My key";
  var ss = SpreadsheetApp.openById(SpreadSheetKey);
  var sheetD = ss.getSheetByName("Daily");
  var sheetW = ss.getSheetByName("Weekly");
  var sheets = [sheetD, sheetW];
  var toSs = {};
  for(s in sheets) {
    var data = sheets[s].getDataRange().getValues();
    for(i in data){
      // EDIT: remove commas from join("") for blank test
      data[i].unshift(data[i].join(""));
      }
    data.sort();
    // remove blank rows -- Edit
    var blank = 0;
    while(data[blank][0].trim().length == 0) {blank++};
    if(blank > 0) data.splice(0, blank);
    // end Edit
    var len = data.length - 1;
    for(var x = len; x > 0; x-- ) {
      if(data[x][0] == data[x-1][0]) {
        data.splice(x, 1);
        };
      };
    for(i in data) {
      data[i].splice( 0, 1);
      };
    toSs[sheets[s].getSheetName()] = data;
    };
  for(s in sheets) {
    var data = toSs[sheets[s].getSheetName()];
    sheets[s].clearContents();
    sheets[s].getRange(1, 1, data.length, data[0].length).setValues(data);
  }
}

Edited per Henrique's comment.

Edited 5/8: Remove blank rows(2 edited areas marked)

ScampMichael
  • 3,688
  • 2
  • 16
  • 23
  • There's one important Apps Script enhancement that can be applied here, which is to remove `setValues` from the loop and do it afterwards (in a separate loop). Because if you `setValues` and then call `getValues^ (in the next iteration), it will force a spreadsheet flush and recalculation. Since the values in the 2nd get may have changed due to the 1st set call. So, general tip, get everything first, set everything in the end. Do not call set and get in a mixed order. – Henrique G. Abreu May 07 '15 at 20:56
  • Thank you Henrique. It's always a pleasure and enlightening to hear from you. Have a good one. – ScampMichael May 07 '15 at 21:09
  • Thank you for the script @ScampMichael, it works good except for two things _1. a new blank line at the top of table is being created and 2. the data is not being sorted according to date, which is the first column but it is by second column i.e Name._ Thank you Henrique, your tip really made the difference in execution time – Ashwaq May 08 '15 at 14:00
  • if the data was sorted by date to begin with the first version of the code should leave it so. See 2 edited areas in code to solve blank row. – ScampMichael May 08 '15 at 16:56
  • The edited code has solved the problem of blank row, but the data is still not being sorted by date. @ScampMichael – Ashwaq May 11 '15 at 13:26
  • If this code reduced the execution time so that your script runs without timing out then I would consider it answered and that sorting by date is a separate issue that should be addressed in a separate question with your revised code. There was nothing about sorting in the original question – ScampMichael May 11 '15 at 13:40
  • @ScampMichael: you are right , I will ask the sorting issue in another post. Thank you very much, you are really helpful. Could you also look into another question of mine? [link]http://stackoverflow.com/q/30168712/4583071 – Ashwaq May 12 '15 at 12:12
2

There is no problem with your script. It is just exceeding the "maximum execution time" allowed for any script (which is currently 6 minutes).

To workaround this problem you'll have to split your problem into "less than 6 minutes" parts.

For example, in your code you're clearing duplicates from 2 sheets. Trying creating two functions, one for each, and run them separately.

Also, there could be some performance enhancements that could make the script run under 6 minutes. For example, I'm not sure joining each row is the best way (performance-wise) to do an array comparison.

Creating a new array to re-set the data might not be optimal either, I'd probably go with a map verification, which is constant-time, instead of O(n^2) double array checking you're doing.

Bottom line, this is a limitation you have to live with in Apps Script. And any solution anyone proposes is just a workaround, that will also eventually fail if your data gets overly big.

Henrique G. Abreu
  • 17,406
  • 3
  • 56
  • 65
  • Great answer with O() and all – Zig Mandel May 07 '15 at 16:03
  • to be picky, using a map is not O(1) its O(log n) per lookup – Zig Mandel May 07 '15 at 16:05
  • I always thought it was O(1), but I never searched it. Anyway, I did it now and [it seems](http://stackoverflow.com/questions/12241676/javascript-objects-as-hashes-is-the-complexity-greater-than-o1) browsers's implementation are constant. Have you tested Apps Script implementation? (thanks for bringing it up) – Henrique G. Abreu May 07 '15 at 16:41
  • Those answers are not strictly correct. Yes you can get O(1) if using enough memory (much more than the entire object space) but only up to a certain point. Then it has rocgo to log(n). Theres no free lunch, just that log(n) needs a really huge n value to notice that its not a constant. – Zig Mandel May 07 '15 at 18:35
  • 1
    These other one is more complete. http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1 one easy way to see where log n is introduced is by looking at the key itself. To compare keys or produce hashes its proportional to the key string length which is basically log n. The trick people use to claim o(1) is by saying "few collitions" but that implies a certain key length thus log n is there. – Zig Mandel May 07 '15 at 19:33