0

I am trying to fetch a large CSV from an external URL in Apps Script. CSV dimensions for reference: (no. of rows = 124653, no. of columns = 52, fileSize = 72 MB)

When I download the CSV manually via the URL in a browser, it has all the records (no. of records = 124653). But when fetched the same in Apps Script using UrlFetchApp.fetch(URL), not all records were present/returned (no. of records = 98701)

I guess this is because of the response limitation of UrlFetchApp.fetch which is 50 MB as per the documentation https://developers.google.com/apps-script/guides/services/quotas#current_limitations

Is there any alternative to this? The alternative should be ideally limited to using Google Apps Script itself but open to explore other options too if its not possible. Once the file is parsed, storing the data to BigQuery.

Rubén
  • 34,714
  • 9
  • 70
  • 166
abitgcp01
  • 63
  • 6
  • You will probably have to use a nextpagetoken and loop through several pages to get all of your data and you may have to do it in batches – Cooper Sep 12 '22 at 14:01
  • This is a duplicate of https://stackoverflow.com/q/73684443/1595451 – Rubén Sep 12 '22 at 16:36
  • What is the source of the CSV file? Does the alternative that you are looking should be limited to using Google Apps Script built-in services or you could upload the file, i.e., to BigQuery? What will you do with the file once it's parsed? – Rubén Sep 12 '22 at 16:38
  • 1
    While it is related, I reopened this because the underlying issue is drastically different from the previous question. This deals mainly with getting around `urlfetchapp` limitation of 50MB/call. The csv truncation would be a byproduct and not the main issue. – TheMaster Sep 12 '22 at 19:03
  • 1
    @Rubén The source of the CSV file is an external URL. The alternative should be ideally limited to using Google Apps Script itself but open to explore other options too if its not possible. Once the file is parsed, storing the data to BigQuery. – abitgcp01 Sep 13 '22 at 06:05
  • 2
    Have you checked whether this URL allows for fetching the CSV partially? Otherwise, I'd consider uploading the CSV somewhere else so that it is more easily accessible within Apps Script (e.g. Drive). – Iamblichus Sep 13 '22 at 10:38
  • @abitgcp01 Thanks for your reply. I'm afraid that the file is too big to be handled by the Google Apps Script Fetch Service and to be uploaded directly into BigQuery (see https://stackoverflow.com/q/23762006/1595451, https://stackoverflow.com/q/51842509/1595451) P.S. This question looks to be becoming into asking for recommendations which is off-topic. Try [softwarerecs.se] (before posting a question there check their "How to ask" guidelines) – Rubén Sep 13 '22 at 14:51
  • 1
    @Rubén - You mean the file is too big to be handled by Apps Script fetch service. I get that and I am testing out the partial download option currently. But I think Big Query does not have any such limitation of 50 MB for its load job. Please correct me if you meant that it is true for big query upload as well. – abitgcp01 Sep 14 '22 at 13:07
  • By "directly" I mean without changing the file prior sending it to BigQuery using Google Apps Script. – Rubén Sep 14 '22 at 13:11
  • 3
    Kindly add a answer with your code and explanations of my suggestions using `Range` header, so that others with the same issue may receive help from your experience. – TheMaster Sep 14 '22 at 14:05
  • Done, thanks for your suggestion which helped in solving this issue. – abitgcp01 Sep 17 '22 at 14:03

1 Answers1

2

This is possible using the Range http header provided your server supports partial downloading.

  let csvStr = '';
  const ranges = ["bytes=0-49999999", "bytes=50000000-72074760"];
  for(const range of ranges) {
    var options = {
      "headers": {
        "Range": range    
      }
    }
    var response = UrlFetchApp.fetch(redirectURL, options);
    var responseCode = response.getResponseCode();
    if(responseCode != 206) {throw 'Error: ' + responseCode + " - " + response}
    csvStr = csvStr + response.getContentText();
  }
  const arr = Utilities.parseCsv(csvStr);
  ...

To check whether your server supports partial downloading, print the response header (response.getHeaders()) and check if Accept-Ranges property does not have the value none.

My csv file size was 72 MB, I have hardcoded the end range in the example code for simplicity but the exact size of the file can be found through the response header (in property - Content-Length when API is called with just one range or no specified range) and it can be variably set.

Thank you @TheMaster for the suggestion in another post which helped in finding this answer!

abitgcp01
  • 63
  • 6