5

GAS is quite powerful and you could write a full fledged web-app using a Google Sheet as the DB back-end. There are many reasons not to do this but I figure in some cases it is okay.

I think the biggest issue will be performance issues when looking for rows based on some criteria in a sheet with a lot of rows. I know there are many ways to "query" a sheet but I can't find reliable information on which is the fastest.

One of the complexities is that many people can edit a sheet which means there are a variable number of situations you'd have to account for. For the sake of simplicity, I want to assume the sheet:

  • Is locked down so only one person can see it
  • The first column has the row number (=row())

The most basic query is finding a row where a specific column equals some value.

Which method would be the fastest?

halfer
  • 19,824
  • 17
  • 99
  • 186
IMTheNachoMan
  • 5,343
  • 5
  • 40
  • 89
  • I'm voting to close this question as off-topic because this is not a question. – Cooper Jun 24 '19 at 15:23
  • 1
    @Cooper Why is it off-topic? It's a programming question about GAS. And it is a question -- which method is the fastest. Granted I already have the answer but I see merit in others being able to see the answer. Can you please help me understand your concern so I know for next time? – IMTheNachoMan Jun 24 '19 at 15:30

1 Answers1

13

I have a sheet with ~19k rows and ~38 columns, filled with all sorts of unsorted real-world data. That is almost 700k rows so I figured it would be a good sheet to time a few methods and see which is the fastest.

  • method 1: get sheet as a 2D array then go through each row
  • method 2: get sheet as a 2D array, sort it, then using a binary search algorithm to find the row
  • method 3: make a UrlFetch call to Google visualization query and don't provide last row
  • method 4: make a UrlFetch call to Google visualization query and provide last row

Here are the my query functions.

function method1(spreadsheetID, sheetName, columnIndex, query)
{
    // get the sheet values excluding header, 
    var rowValues = SpreadsheetApp.openById(spreadsheetID).getSheetByName(sheetName).getSheetValues(2, 1, -1, -1);

    // loop through each row
    for(var i = 0, numRows = rowValues.length; i < numRows; ++i)
    {
        // return it if found
        if(rowValues[i][columnIndex] == query) return rowValues[i]
    }

    return false;
}

function method2(spreadsheetID, sheetName, columnIndex, query)
{
    // get the sheet values excluding header
    var rowValues = SpreadsheetApp.openById(spreadsheetID).getSheetByName(sheetName).getSheetValues(2, 1, -1, -1);

    // sort it
    rowValues.sort(function(a, b){
        if(a[columnIndex] < b[columnIndex]) return -1;
        if(a[columnIndex] > b[columnIndex]) return 1;
        return 0;
    });

    // search using binary search
    var foundRow = matrixBinarySearch(rowValues, columnIndex, query, 0, rowValues.length - 1);

    // return if found
    if(foundRow != -1)
    {
        return rowValues[foundRow];
    }

    return false;
}

function method3(spreadsheetID, sheetName, queryColumnLetterStart, queryColumnLetterEnd, queryColumnLetterSearch, query)
{
    // SQL like query
    myQuery = "SELECT * WHERE " + queryColumnLetterSearch + " = '" + query + "'";

    // the query URL
    // don't provide last row in range selection
    var qvizURL = 'https://docs.google.com/spreadsheets/d/' + spreadsheetID + '/gviz/tq?tqx=out:json&headers=1&sheet=' + sheetName + '&range=' + queryColumnLetterStart + ":" + queryColumnLetterEnd + '&tq=' + encodeURIComponent(myQuery);

    // fetch the data
    var ret = UrlFetchApp.fetch(qvizURL, {headers: {Authorization: 'Bearer ' + ScriptApp.getOAuthToken()}}).getContentText();

    // remove some crap from the return string
    return JSON.parse(ret.replace("/*O_o*/", "").replace("google.visualization.Query.setResponse(", "").slice(0, -2));
}

function method4(spreadsheetID, sheetName, queryColumnLetterStart, queryColumnLetterEnd, queryColumnLetterSearch, query)
{
    // find the last row in the sheet
    var lastRow = SpreadsheetApp.openById(spreadsheetID).getSheetByName(sheetName).getLastRow();

    // SQL like query
    myQuery = "SELECT * WHERE " + queryColumnLetterSearch + " = '" + query + "'";

    // the query URL
    var qvizURL = 'https://docs.google.com/spreadsheets/d/' + spreadsheetID + '/gviz/tq?tqx=out:json&headers=1&sheet=' + sheetName + '&range=' + queryColumnLetterStart + "1:" + queryColumnLetterEnd + lastRow + '&tq=' + encodeURIComponent(myQuery);

    // fetch the data
    var ret = UrlFetchApp.fetch(qvizURL, {headers: {Authorization: 'Bearer ' + ScriptApp.getOAuthToken()}}).getContentText();

    // remove some crap from the return string
    return JSON.parse(ret.replace("/*O_o*/", "").replace("google.visualization.Query.setResponse(", "").slice(0, -2));
}

My binary search algorithm:

function matrixBinarySearch(matrix, columnIndex, query, firstIndex, lastIndex)
{
    // find the value using binary search
    // https://www.w3resource.com/javascript-exercises/javascript-array-exercise-18.php

    // first make sure the query string is valid
    // if it is less than the smallest value
    // or larger than the largest value
    // it is not valid
    if(query < matrix[firstIndex][columnIndex] || query > matrix[lastIndex][columnIndex]) return -1;

    // if its the first row
    if(query == matrix[firstIndex][columnIndex]) return firstIndex;

    // if its the last row
    if(query == matrix[lastIndex][columnIndex]) return lastIndex;

    // now start doing binary search
    var middleIndex = Math.floor((lastIndex + firstIndex)/2);

    while(matrix[middleIndex][columnIndex] != query && firstIndex < lastIndex)
    {
        if(query < matrix[middleIndex][columnIndex])
        {
            lastIndex = middleIndex - 1;
        }
        else if(query > matrix[middleIndex][columnIndex])
        {
            firstIndex = middleIndex + 1;
        }

        middleIndex = Math.floor((lastIndex + firstIndex)/2);
    }

    return matrix[middleIndex][columnIndex] == query ? middleIndex : -1;
}

This is the function I used to test them all:

// each time this function is called it will try one method
// the first time it is called it will try method1
// then method2, then method3, then method4
// after it does method4 it will start back at method1
// we will use script properties to save which method is next
// we also want to use the same query string for each batch so we'll save that in script properties too
function testIt()
{
    // get the sheet where we're staving run times
    var runTimesSheet = SpreadsheetApp.openById("...").getSheetByName("times");

    // we want to see true speed tests and don't want server side caching so we a copy of our data sheet
    // make a copy of our data sheet and get its ID
    var tempSheetID = SpreadsheetApp.openById("...").copy("temp sheet").getId();

    // get script properties
    var scriptProperties = PropertiesService.getScriptProperties();

    // the counter
    var searchCounter = Number(scriptProperties.getProperty("searchCounter"));

    // index of search list we want to query for
    var searchListIndex = Number(scriptProperties.getProperty("searchListIndex"));

    // if we're at 0 then we need to get the index of the query string
    if(searchCounter == 0)
    {
        searchListIndex = Math.floor(Math.random() * searchList.length);
        scriptProperties.setProperty("searchListIndex", searchListIndex);
    }

    // query string
    var query = searchList[searchListIndex];

    // save relevant data
    var timerRow = ["method" + (searchCounter + 1), searchListIndex, query, 0, "", "", "", ""];

    // run the appropriate method
    switch(searchCounter)
    {
        case 0:
            // start time
            var start = (new Date()).getTime();

            // run the query
            var ret = method1(tempSheetID, "Extract", 1, query);

            // end time
            timerRow[3] = ((new Date()).getTime() - start) / 1000;

            // if we found the row save its values in the timer output so we can confirm it was found
            if(ret)
            {
                timerRow[4] = ret[0];
                timerRow[5] = ret[1];
                timerRow[6] = ret[2]; 
                timerRow[7] = ret[3];
            }
            break;
        case 1:
            var start = (new Date()).getTime();
            var ret = method2(tempSheetID, "Extract", 1, query);
            timerRow[3] = ((new Date()).getTime() - start) / 1000;
            if(ret)
            {
                timerRow[4] = ret[0];
                timerRow[5] = ret[1];
                timerRow[6] = ret[2]; 
                timerRow[7] = ret[3];
            }
            break;
        case 2:
            var start = (new Date()).getTime();
            var ret = method3(tempSheetID, "Extract", "A", "AL", "B", query);
            timerRow[3] = ((new Date()).getTime() - start) / 1000;
            if(ret.table.rows.length)
            {
                timerRow[4] = ret.table.rows[0].c[0].v;
                timerRow[5] = ret.table.rows[0].c[1].v;
                timerRow[6] = ret.table.rows[0].c[2].v;
                timerRow[7] = ret.table.rows[0].c[3].v;
            }
            break;
        case 3:
            var start = (new Date()).getTime();
            var ret = method3(tempSheetID, "Extract", "A", "AL", "B", query);
            timerRow[3] = ((new Date()).getTime() - start) / 1000;
            if(ret.table.rows.length)
            {
                timerRow[4] = ret.table.rows[0].c[0].v;
                timerRow[5] = ret.table.rows[0].c[1].v;
                timerRow[6] = ret.table.rows[0].c[2].v;
                timerRow[7] = ret.table.rows[0].c[3].v;
            }
            break;
    }

    // delete the temp file
    DriveApp.getFileById(tempSheetID).setTrashed(true);

    // save run times
    runTimesSheet.appendRow(timerRow);

    // start back at 0 if we're the end
    if(++searchCounter == 4) searchCounter = 0;

    // save the search counter
    scriptProperties.setProperty("searchCounter", searchCounter);
}

I have a global variable searchList that is an array of various query strings -- some are in the sheet, some are not.

I ran testit on a trigger to run every minute. After 152 iterations I had 38 batches. Looking at the result, this is what I see for each method:

| Method  | Minimum Seconds | Maximum Seconds | Average Seconds |
|---------|-----------------|-----------------|-----------------|
| method1 |            8.24 |           36.94 |           11.86 |
| method2 |            9.93 |           23.38 |           14.09 |
| method3 |            1.92 |            5.48 |            3.06 |
| method4 |            2.20 |           11.14 |            3.36 |

So it appears that, at least for my data-set, is using Google visualization query is the fastest.

IMTheNachoMan
  • 5,343
  • 5
  • 40
  • 89
  • if I get time I want to do a few other test-cases like getting all rows matching a single query and getting all rows matching multiple queries. – IMTheNachoMan Jun 24 '19 at 13:42
  • It would be interesting to know that If the data is already sorted, by how much would method2 be faster. – TheMaster Jun 24 '19 at 16:09
  • @TheMaster That is next test on list. – IMTheNachoMan Jun 24 '19 at 16:11
  • 2
    Also in my environment, the search using the Query Language was the fastest of all. When the values are searched from Spreadsheet, the cost for getting all values from Spreadsheet is the large bottleneck. So the search using the Query Language and [TheMaster's proposal](https://stackoverflow.com/questions/56737713/fastest-way-to-search-for-a-row-in-a-large-google-sheet-using-in-google-apps-scr#comment100039534_56737759) are suitable for this situation. And also, if you want to be simple script, I think that TextFinder added recently can be used. https://stackoverflow.com/a/56663884 – Tanaike Jun 24 '19 at 22:55
  • @Tanaike I will add a TextFinder test too. – IMTheNachoMan Jun 24 '19 at 23:52
  • Hi @IMTheNachoMan, did you get to test TextFinder? – grreeenn Mar 28 '21 at 14:35
  • @grreeenn No. Never got around to it. – IMTheNachoMan Mar 29 '21 at 00:03
  • 1
    For what it's worth, I found TextFinder was much faster than loading the whole sheet with .getValues() @grreeenn – Josh May 23 '23 at 20:10