0

I am following the following tutorial to import stock options data into a Google sheet.

https://www.youtube.com/watch?v=Be7z9YeeVY0&ab_channel=daneshj

The following formula will import data from yahoo finance into the sheet:

=iferror(TRANSPOSE(IMPORTXML(CONCATENATE("https://finance.yahoo.com/quote/",A2,"?p=",A2),"//tr")),"You have to add a contract name in column A")

At first glance, everything looks fine, as it seems to be pulling data back from the webpage; however, all the values are incorrect.

The URL it is pulling data from in this example is below. Note that that the data changes frequently.

https://finance.yahoo.com/quote/NKLA220121C00002500?p=NKLA220121C00002500

The numbers are not only wrong in this specific example, they are wrong every time, and with a great enough margin of error that I don't believe it is due to IMPORTXML caching the page. I have searched through the HTML source of the webpage and can't find the values from IMPORTXML anywhere.

enter image description here

enter image description here

Rubén
  • 34,714
  • 9
  • 70
  • 166
Evan Hessler
  • 297
  • 3
  • 19
  • I have a question. In your tags, `google-apps-script` is included. By this, I proposed a workaround using Google Apps Script as an answer. Could you please confirm it? If that was not the same with your direction, I apologize. – Tanaike Oct 20 '20 at 03:34

1 Answers1

5

Issue and workaround:

  • When I tested your sample formula of =iferror(TRANSPOSE(IMPORTXML("https://finance.yahoo.com/quote/NKLA220121C00002500?p=NKLA220121C00002500","//tr")),"You have to add a contract name in column A"), I could confirm the same situation of your issue.
  • On the other hand, when I retrieve the HTML data from the URL, I could confirm that the same values with the page which is seen with my browser. It seems that this is the same with your bottom image.
  • Unfortunately, I cannot understand about the clear reason of this difference.

From these situation, in this answer, I would like to propose to use Google Apps Script.

Sample script:

Please copy and paste the following script to the container-bound script of Google Spreadsheet and save it. And please put =SAMPLE("https://finance.yahoo.com/quote/NKLA220121C00002500?p=NKLA220121C00002500") to a cell. By this, the result is returned. In this case, the Google Apps Script is used as a custom function.

function SAMPLE(url) {
  const res = UrlFetchApp.fetch(url, {muteHttpExceptions: true});
  const tables = [...res.getContentText().matchAll(/(<table[\w\s\S]+?<\/table>)/g)];
  if (tables.length < 2) return "No tables. Please confirm URL again.";
  const values = tables.reduce((ar, [,table]) => {
    if (table) {
      const root = XmlService.parse(table).getRootElement();
      const temp = root.getChild("tbody", root.getNamespace()).getChildren().map(e => e.getChildren().map(f => isNaN(f.getValue()) ? f.getValue() : Number(f.getValue())));
      ar = ar.concat(temp);
    }
    return ar;
  }, []);
  return values[0].map((_, i) => values.map(r => r[i]));
}

Result:

enter image description here

Note:

I tested this sample script for the URL of https://finance.yahoo.com/quote/NKLA220121C00002500?p=NKLA220121C00002500. When the URL is changed, the script might not be able to be used. So please be careful this. At that time, please analyze each HTML data and modify the script.

References:

halfer
  • 19,824
  • 17
  • 99
  • 186
Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • Very strange issue indeed. Thank you for the workaround. – Evan Hessler Oct 20 '20 at 04:24
  • @Evan Hessler Thank you for replying. I'm glad your issue was resolved. At first, I thought that the result value might depend on the access area. But I couldn't find the query parameter for giving the access area. So I proposed to use Google Apps Script. – Tanaike Oct 20 '20 at 04:31
  • 1
    Smart idea! Thanks again :) – Evan Hessler Oct 20 '20 at 04:48
  • This worked for me. Is there a way to remove the heading fields so only the values are returned and populate the cells? essentially I am looking to use this on a number of different tickers in column A, to create rows for each, and I do not need the header row, but could not figure out how to remove it from the script. – mdkb Dec 02 '21 at 02:22
  • 1
    @mdkb Thank you for your comment. About your question, in that case, how about modifying from `return values[0].map((_, i) => values.map(r => r[i]));` to `const [,...v] = values[0].map((_, i) => values.map(r => r[i])); return v;`? By this, I think that the header row can be removed. If I misunderstood your question, I apologize. – Tanaike Dec 02 '21 at 07:59