12

Let's say I have a a script that iterates over a list of 400 objects. Each object has anywhere from 1 to 10 properties. Each property is a reasonable size string or a somewhat large integer.

Is there a significant difference in performance of saving these objects into ScriptDB vs saving them into Spreadsheet(w/o doing it in one bulk operation).

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
Anton Soradoi
  • 1,841
  • 1
  • 22
  • 36

2 Answers2

30

Executive Summary

Yes, there is a significant difference! Huge! And I have to admit that this experiment didn't turn out the way I expected.

With this amount of data, writing to a spreadsheet was always much faster than using ScriptDB.

These experiments support the assertions regarding bulk operations in the Google Apps Script Best Practices. Saving data in a spreadsheet using a single setValues() call was 75% faster than line-by-line, and two orders of magnitude faster than cell-by-cell.

On the other hand, recommendations to use Spreadsheet.flush() should be considered carefully, due to the performance impact. In these experiments, a single write of a 4000-cell spreadsheet took less than 50ms, and adding a call to flush() increased that to 610ms - still less than a second, but an order of magnitude tax seems ludicrous. Calling flush() for each of the 400 rows in the sample spreadsheet made the operation take almost 12 seconds, when it took just 164 ms without it. If you've been experiencing Exceeded maximum execution time errors, you may benefit from both optimizing your code AND removing calls to flush().

Experimental Results

All timings were derived following the technique described in How to measure time taken by a function to execute. Times are expressed in milliseconds.

Here are the results from a single pass of five different approaches, two using ScriptDB, three writing to Spreadsheets, all with the same source data. (400 objects with 5 String & 5 Number attributes)

Experiment 1

  • Elapsed time for ScriptDB/Object test: 53529
  • Elapsed time for ScriptDB/Batch test: 37700
  • Elapsed time for Spreadsheet/Object test: 145
  • Elapsed time for Spreadsheet/Attribute test: 4045
  • Elapsed time for Spreadsheet/Bulk test: 32

Effect of Spreadsheet.flush()

Experiment 2

In this experiment, the only difference from Experiment 1 was that we called Spreadsheet.flush() after every setValue/s call. The cost of doing so is dramatic, (around 700%) but does not change the recommendation to use a spreadsheet over ScriptDB for speed reasons, because writing to spreadsheets is still faster.

  • Elapsed time for ScriptDB/Object test: 55282
  • Elapsed time for ScriptDB/Batch test: 37370
  • Elapsed time for Spreadsheet/Object test: 11888
  • Elapsed time for Spreadsheet/Attribute test: 117388
  • Elapsed time for Spreadsheet/Bulk test: 610

Note: This experiment was often killed with Exceeded maximum execution time.

Caveat Emptor

You're reading this on the interwebs, so it must be true! But take it with a grain of salt.

  • These are results from very small sample sizes, and may not be completely reproducible.
  • These results are measuring something that changes constantly - while they were observed on Feb 28 2013, the system they measured could be completely different when you read this.
  • The efficiency of these operations is affected by many factors that are not controlled in these experiments; caching of instructions & intermediate results and server load, for example.
  • Maybe, just maybe, someone at Google will read this, and improve the efficiency of ScriptDB!

The Code

If you want to perform (or better yet, improve) these experiments, create a blank spreadsheet, and copy this into a new script within it. This is also available as a gist.

/**
 * Run experiments to measure speed of various approaches to saving data in
 * Google App Script (GAS).
 */
function testSpeed() {
  var numObj = 400;
  var numAttr = 10;
  var doFlush = false;  // Set true to activate calls to SpreadsheetApp.flush()

  var arr = buildArray(numObj,numAttr);
  var start, stop;  // time catchers
  var db = ScriptDb.getMyDb();
  var sheet;

  // Save into ScriptDB, Object at a time
  deleteAll(); // Clear ScriptDB
  start = new Date().getTime();
    for (var i=1; i<=numObj; i++) {
      db.save({type: "myObj", data:arr[i]});
    }
  stop = new Date().getTime();
  Logger.log("Elapsed time for ScriptDB/Object test: " + (stop - start));

  // Save into ScriptDB, Batch
  var items = [];
  // Restructure data - this is done outside the timed loop, assuming that
  // the data would not be in an array if we were using this approach.
  for (var obj=1; obj<=numObj; obj++) {
    var thisObj = new Object();
    for (var attr=0; attr < numAttr; attr++) {
      thisObj[arr[0][attr]] = arr[obj][attr];
    }
    items.push(thisObj);
  }
  deleteAll(); // Clear ScriptDB
  start = new Date().getTime();
    db.saveBatch(items, false);
  stop = new Date().getTime();
  Logger.log("Elapsed time for ScriptDB/Batch test: " + (stop - start));

  // Save into Spreadsheet, Object at a time
  sheet = SpreadsheetApp.getActive().getActiveSheet().clear();
  start = new Date().getTime();
    for (var row=0; row<=numObj; row++) {
      var values = [];
      values.push(arr[row]);
      sheet.getRange(row+1, 1, 1, numAttr).setValues(values);
      if (doFlush) SpreadsheetApp.flush();
    }
  stop = new Date().getTime();
  Logger.log("Elapsed time for Spreadsheet/Object test: " + (stop - start));

  // Save into Spreadsheet, Attribute at a time
  sheet = SpreadsheetApp.getActive().getActiveSheet().clear();
  start = new Date().getTime();
    for (var row=0; row<=numObj; row++) {
      for (var cell=0; cell<numAttr; cell++) {
        sheet.getRange(row+1, cell+1, 1, 1).setValue(arr[row][cell]);
        if (doFlush) SpreadsheetApp.flush();
      }
    }
  stop = new Date().getTime();
  Logger.log("Elapsed time for Spreadsheet/Attribute test: " + (stop - start));

  // Save into Spreadsheet, Bulk
  sheet = SpreadsheetApp.getActive().getActiveSheet().clear();
  start = new Date().getTime();
    sheet.getRange(1, 1, numObj+1, numAttr).setValues(arr);
    if (doFlush) SpreadsheetApp.flush();
  stop = new Date().getTime();
  Logger.log("Elapsed time for Spreadsheet/Bulk test: " + (stop - start));
}

/**
 * Create a two-dimensional array populated with 'numObj' rows of 'numAttr' cells.
 */
function buildArray(numObj,numAttr) {
  numObj = numObj | 400;
  numAttr = numAttr | 10;
  var array = [];
  for (var obj = 0; obj <= numObj; obj++) {
    array[obj] = [];
    for (var attr = 0; attr < numAttr; attr++) {
      var value;
      if (obj == 0) {
        // Define attribute names / column headers
        value = "Attr"+attr;
      }
      else {
        value = ((attr % 2) == 0) ? "This is a reasonable sized string for testing purposes, not too long, not too short." : Number.MAX_VALUE;
      }
      array[obj].push(value);
    }
  }
  return array
}

function deleteAll() {
  var db = ScriptDb.getMyDb();
  while (true) {
    var result = db.query({}); // get everything, up to limit
    if (result.getSize() == 0) {
      break;
    }
    while (result.hasNext()) {
      var item = result.next()
      db.remove(item);
    }
  }
}
Community
  • 1
  • 1
Mogsdad
  • 44,709
  • 21
  • 151
  • 275
  • wow, just wow... this confirms what I was feeling. The spreadsheet service has been improved (quite) recently, I noticed that on some scripts that I wrote 2 years ago and that run much faster that when I initially wrote them. There was also another serie of experiments that have interesting results [here](http://stackoverflow.com/questions/11854575/gas-performance-slower-than-other-server-side-javascript) by [megabyte1024](http://stackoverflow.com/users/1227895/megabyte1024) – Serge insas Mar 01 '13 at 06:41
  • @Serge Thanks for pointing that Question out! Have you tried those benchmarks again? I wonder if they've improved. Eric_Koleda's answer said he'd report updates on any progress - maybe there has been some he didn't know about? – Mogsdad Mar 01 '13 at 10:35
  • Following your suggestion I re-run the tests and get better results than those previously obtained... about 2 times faster for the fastest (265 complete test, 13mS for the sort) but still a couple of 700mS from time to time (1 of 10 maybe?). I don't know really if I'm lucky today or if the server engine has been improved, only a Google guy could tell that ;-) but anyway both are good news (always nice to be lucky ^^) - still, I think I've read somewhere that they had improved gas performance recently but I just can't remember where or exactly when... sorry. – Serge insas Mar 01 '13 at 12:28
  • This is a great answer, just some notes: (1) Using `.flush()` forces the current operation on the spreadsheet to finish before continuing. This is only *necessary* if you have [conflicting operations](http://www.celticwolf.com/blog/2010/04/27/what-is-a-race-condition/comment-page-1/). So while this experiment shows that it reduces performance drastically, it may be necessary at times. – Phil Bozak Mar 06 '13 at 17:43
  • (2) In terms of lookups (reading from DB), ScriptDb may provide better results for very large datasets. It's built to handle that. Spreadsheets would require you to load all of that data into an array to read through it (unless there is an efficient way of querying a spreadsheet that I'm unaware of). – Phil Bozak Mar 06 '13 at 17:43
  • @PhilBozak WRT `flush()`, just as I recommended, consider carefully whether you need to use it. Conflicting ops may be a good reason, if you've got multiple scripts or users manipulating a sheet. WRT efficient query... after this little experiment, I'm no longer willing to accept that a service is good at something because it was built for it! I figured saving in ScriptDb would be much faster than to a spreadsheet, and was disappointed. It might be faster to load the spreadsheet into an array and just do a linear search! You'd have to measure to be sure. – Mogsdad Mar 06 '13 at 18:18
  • 2
    I'm just mentioning that this experiment didn't explore the lookup aspect of databases. Suppose you have a DB of a million items or so and you want to grab a single specific item, I feel that ScriptDb would perform better at the lookup because you don't have to load in the whole dataset into an array. But like you said, you won't know until you test. – Phil Bozak Mar 06 '13 at 19:03
  • great post --- Cache: a Cache for sheet read/write operations was added some time ago - (This is mentioned somewhere by one of the developers. where?..was a while ago..). This means groups of read operations will be merged into one read, also groups of write operations will be merged into one write. – eddyparkinson Mar 08 '13 at 01:38
  • 2
    Another +1 for an excellent answer. Here are some more average speeds... for querying one attribute and returning the first record in the result (Spreadsheet: using `getDataRange().getValues()` and iterating through that) - Spreadsheet 179ms, ScriptDb 87ms (note that for ScriptDb, `result = db.query({query obj})` takes 0ms (on average!) and `firstRecord = result.next()` takes 87ms). For writing one record to the database (Spreadsheet: using `appendRow()`) - Spreadsheet 111ms, ScriptDb 164ms. – AdamL Apr 13 '13 at 03:33
  • obviously spreadsheets can't handle arbitrarily nested objects like scriptDB can. – JSDBroughton Jan 02 '14 at 20:42
4

ScriptDB has been deprecated. Do not use.

ScampMichael
  • 3,688
  • 2
  • 16
  • 23