1

I am just starting out in Google Apps Script. Since best coding practices recommend using as few sheet formulas as possible I am trying to do my web scraping with GAS Parser then push the data over to my spreadsheet.

Within my sheet using the below formula returns a table of data which is exactly what I am looking for from GAS.

=IMPORTHTML("https://finance.yahoo.com/quote/BOO.L/history?p=BOO.L", "table", 1)

The two questions here & here are similar but trying those methods also fail. It almost seems like I am not getting the full page content since when I view data in Logger.log() after the code below I am not getting anything that resembles the page I need.

UrlFetchApp.fetch(url).getContentText();

Since running the formula seems to get the data perfectly I can only assume the problems with my own code but can't figure where. Here is the code I have tried thus far;

function scrapeData() {
var url = "https://finance.yahoo.com/quote/BARC.L/history?p=BARC.L";
var fromText = '<td class="Py(10px) Ta(start) Pend(10px)"><span>';
var toText = '</span></td>';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser
          .data(content)
          .from(fromText)
          .to(toText)
          .iterate();

Logger.log(scraped)
}

Any guidance much appreciated.

Rubén
  • 34,714
  • 9
  • 70
  • 166
redbaron1981
  • 407
  • 3
  • 9
  • 1
    Why not just use =GOOGLEFINANCE()? – TheWizEd Oct 27 '18 at 21:57
  • As per the question, I want to use as little spreadsheet functions as possible. – redbaron1981 Oct 27 '18 at 22:58
  • `Parser` isn't a built-in Google Apps Script class. Are you using a library? – Rubén Oct 27 '18 at 23:08
  • 1
    Best coding practices? Why use js code at all, when you can do that with a single formula-code? – TheMaster Oct 27 '18 at 23:14
  • I used to use java to get historical data from Yahoo finance. Then in 2017 when it was bought by Verizon nothing worked anymore. And I couldn't figure out a way to get it to work so I started using GOOGLEFINANCE. I'm happy as a clam now. – TheWizEd Oct 27 '18 at 23:29
  • Since many of you have suggested using built in functions, since this I intend on looking at many instruments using 100+ googlefinance calls is to slow even though it is an option. – redbaron1981 Oct 28 '18 at 17:28

1 Answers1

1
  • You want to retrieve and put the values from the URL to Spreadsheet using Google Apps Script.

If my understanding is correct, how about this modification? I think that there are several answers for your situation. So please think of this as one of them.

Modification points:

  • In order to retrieve the table, I used Parser and XmlService.
    1. Retrieve the table as the string value using Parser.
    2. Parse the table using XmlService. I think that XmlService makes us easily parse the table.

XmlService is the strong parsing tool of XML. So when this can be used to HTML, it makes us retrieve the values from HTML more easily. However, recently, the most HTML cannot be directly parsed by XmlService. So I always use this flow.

Modified script:

function scrapeData() {
  // Retrieve table as a string using Parser.
  var url = "https://finance.yahoo.com/quote/BOO.L/history?p=BOO.L";
  // var url = "https://finance.yahoo.com/quote/BARC.L/history?p=BARC.L";
  var fromText = '<div class="Pb(10px) Ovx(a) W(100%)" data-reactid="30">';
  var toText = '<div class="Mstart(30px) Pt(10px)"';
  var content = UrlFetchApp.fetch(url).getContentText();
  var scraped = Parser.data(content).from(fromText).to(toText).build();

  // Parse table using XmlService.
  var root = XmlService.parse(scraped).getRootElement();
  // Retrieve header
  var headerTr = root.getChild("thead").getChildren();
  var res = headerTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})});
  var len = res[0].length;
  // Retrieve values
  var valuesTr = root.getChild("tbody").getChildren();
  var values = valuesTr.map(function(e) {return e.getChildren().map(function(f) {return f.getValue()})})
  .map(function(e) {return e.length == len ? e : e.concat(Array.apply(null, new Array(len - e.length)).map(String.prototype.valueOf,""))});
  Array.prototype.push.apply(res, values);

  // Put the result to the active spreadsheet.
  var ss = SpreadsheetApp.getActiveSheet();
  ss.getRange(1, 1, res.length, res[0].length).setValues(res);
}

Note:

  • Before you run this modified script, please install the GAS library of Parser.
  • In my environment, I could confirmed that the modified script works for both p=BOO.L and p=BARC.L. I couldn't confirm others. So when you tried others, if an error occurs, please modify the script.

Reference:

If this was not what you want, I'm sorry.

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • This works perfectly the Parser - XmlService certainly makes a strong combination. As you mentioned there are several answers to this question. In other languages I have used(Python & R) there are built in functions for sealing directly with html tables. Do you know if anything similar exists in GAS or JS? Many thanks for the help. – redbaron1981 Oct 29 '18 at 19:15
  • @redbaron1981 Thank you for replying. I'm glad your issue was resolved. At GAS, there are no methods for this situation. So I used Parser and XmlService. At Javascript, how about this? https://developer.mozilla.org/en-US/docs/Web/API/DOMParser If there are not what you want, I'm sorry. – Tanaike Oct 29 '18 at 22:46
  • I am also trying to scrape data in a similar fashion from here; https://stooq.com/q/d/?s=boo.uk. Do you have any idea why I am getting "Error on line 1: Content is not allowed in prolog. "? Thanks. – redbaron1981 Oct 30 '18 at 16:10
  • @redbaron1981 I think that it is new question. And because the URL is different from this question, this script cannot be used for it. Also I'm not sure about the result what you want. So can you post it as new question? When you post it, please include the detail information. By this, it will help users including me think of your solutions. – Tanaike Oct 30 '18 at 22:16
  • I have taken your advice and posted a new question here. https://stackoverflow.com/questions/53081241/error-on-line-1-content-is-not-allowed-in-prolog – redbaron1981 Oct 31 '18 at 10:31