2

I've been using ImportHTML to draw MLB stats from this link without issue: "https://widgets.sports-reference.com/wg.fcgi?css=1&site=br&url=%2Fleagues%2FMLB%2F2021-standard-batting.shtml&div=div_players_standard_batting". I was simply reproducing the table on a Google Sheet.

However, the source table has now grown so large that I'm getting this error in my Sheet: "Resource at url contents exceeded maximum size."

I know I need to pivot to a Google Apps Script and URLFetchApp, but my amateur approach produces nothing but empty cells. Is it possible for me to reproduce the table in my link using this approach?

Thanks for any guidance and your time.

Rubén
  • 34,714
  • 9
  • 70
  • 166
Joe M
  • 23
  • 3
  • About `I've been using ImportHTML to draw MLB stats from this link without issue:`, can you provide your current formula? And also, about `I know I need to pivot to a Google Apps Script and URLFetchApp, but my amateur approach produces nothing but empty cells.`, can you provide your current script? – Tanaike Jun 01 '21 at 22:57
  • @Tanaike ... do you have an updated version of https://tanaikech.github.io/2019/08/01/parsing-html-using-google-apps-script/ ? I am working on a splitted version of 42 tables from the global one and look for parsing the tables with app script. Thx. – Mike Steelson Jun 02 '21 at 01:41
  • @Mike Steelson I have to apologize for my poor English skill. Unfortunately, I cannot understand about your comment. But, if my sample script was not useful for your situation, I have to apologize. – Tanaike Jun 02 '21 at 06:20
  • 1
    @Mike Steelson Thank you for replying. About `the script uses 'Drive.Files' whici is now deprecated.`, can you provide the official document for this? I had thought that the method of "Files: insert" in Drive API v2 can be still used. [Ref](https://developers.google.com/drive/api/v2/reference/files/insert) So I would like to confirm whether such information (`the script uses 'Drive.Files' whici is now deprecated.`) might have already been published. By the way, when I could correctly understand about your current issue and your goal, I might be able to think of the workaround and/or solution. – Tanaike Jun 02 '21 at 08:43
  • @Tanaike, thanks for your interest in helping! I was digging back through my current script before I replied to you, then saw Mike had suggested a revision to your sample script. I hope you can both understand that I am a beginner, though I have learned so much from reading this community. – Joe M Jun 02 '21 at 18:39
  • @MikeSteelson Brilliant! Your "better way" is precisely the solution. Thank you so much for your effort and concern. – Joe M Jun 02 '21 at 18:40

1 Answers1

1

A better way to retrieve data from table :

function importTableHTML(url) {
  var html = '<table' + UrlFetchApp.fetch(url).getContentText().match(/(?<=\<table).*(?=\<\/table)/g) + '</table>';
  var trs = [...html.matchAll(/<tr[\s\S\w]+?<\/tr>/g)];
  var data = [];
  for (var i=0;i<trs.length;i++){
    var tds = [...trs[i][0].matchAll(/<(td|th)[\s\S\w]+?<\/(td|th)>/g)];
    var prov = [];
    for (var j=0;j<tds.length;j++){
      donnee=tds[j][0].match(/(?<=\>).*(?=\<\/)/g)[0].replace(/&nbsp;/g,' ');
      if(donnee.indexOf("</a>")>-1){
        prov.push(donnee.match(/(?<=\>).*(?=\<\/)/g)[0]);
      }else{
        prov.push(donnee);
      }
    }
    data.push(prov);
  }
  return(data);
}
Mike Steelson
  • 14,650
  • 2
  • 5
  • 20