1

I'm attempting to scrape options pricing data from Yahoo Finance in Google Sheets. Although I'm able to pull the options chain just fine, i.e.

=IMPORTHTML("https://finance.yahoo.com/quote/TCOM/options?date=1610668800","table",2)

I find that it's returning results that don't completely match what's actually shown on Yahoo Finance. Specifically, the scraped results are incomplete - they're missing some strikes. i.e. the first 5 rows of the chart may match, but then it will start returning only every other strike (aka skipping every other strike).

Why would IMPORTHTML be returning "abbreviated" results, which don't match what's actually shown on the page? And more importantly, is there some way to scrape complete data (i.e. that doesn't skip some portion of the available strikes)?

TheMaster
  • 45,448
  • 6
  • 62
  • 85
J23
  • 3,061
  • 6
  • 41
  • 52

3 Answers3

4

In Yahoo finance, all data are available in a big json called root.App.main. So to get the complete set of data, proceed as following

  var source = UrlFetchApp.fetch(url).getContentText()
  var jsonString = source.match(/(?<=root.App.main = ).*(?=}}}})/g) + '}}}}'
  var data = JSON.parse(jsonString)

You can then choose to fetch the informations you need. Take a copy of this example https://docs.google.com/spreadsheets/d/1sTA71PhpxI_QdGKXVAtb0Rc3cmvPLgzvXKXXTmiec7k/copy

edit

if you want to get a full list of available data, you can retrieve it by this simple script

// mike.steelson
let result = []; 
function getAllDataJSON(url = 'https://finance.yahoo.com/quote/TCOM/options?date=1610668800') {
  var source = UrlFetchApp.fetch(url).getContentText()
  var jsonString = source.match(/(?<=root.App.main = ).*(?=}}}})/g) + '}}}}'
  var data = JSON.parse(jsonString)
  getAllData(eval(data),'data')
  var sh = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet()
  sh.getRange(1, 1, result.length, result[0].length).setValues(result);
}
function getAllData(obj,id) {
  const regex = new RegExp('[^0-9]+');
  for (let p in obj) {
    var newid = (regex.test(p)) ? id + '["' + p + '"]' : id + '[' + p + ']';
    if (obj[p]!=null){
      if (typeof obj[p] != 'object' && typeof obj[p] != 'function'){
        result.push([newid, obj[p]]);
      }
      if (typeof obj[p] == 'object') {
        getAllData(obj[p], newid );
      }
    }
  }
}
Mike Steelson
  • 14,650
  • 2
  • 5
  • 20
  • this works, but does not show any of the Options data the OP requested. Where can I find a full list of data matches possible like the limited list used in column B of the spreadsheet, but that will provide Options data for the relevant ticker? – mdkb Dec 02 '21 at 02:57
  • 1
    I have updated my answer : this will give you the 12.000+ available data – Mike Steelson Dec 02 '21 at 07:36
3

Here's a simpler way to get the last market price of a given option. Add this function to you Google Sheets Script Editor.

function OPTION(ticker) {
    var ticker = ticker+"";
    var URL = "finance.yahoo.com/quote/"+ticker;
    var html = UrlFetchApp.fetch(URL).getContentText();
    var count = (html.match(/regularMarketPrice/g) || []).length;
    var query = "regularMarketPrice";
    var loc = 0;
    var n = parseInt(count)-2;
    for(i = 0; i<n; i++) {
        loc = html.indexOf(query,loc+1);
    }

    var value = html.substring(loc+query.length+9, html.indexOf(",", loc+query.length+9));
    return value*100;
}

In your google sheets input the Yahoo Finance option ticker like below

This is the option ticker

=OPTION("AAPL210430C00060000")
2

I believe your goal as follows.

  • You want to retrieve the complete table from the URL of https://finance.yahoo.com/quote/TCOM/options?date=1610668800, and want to put it to the Spreadsheet.

Issue and workaround:

I could replicate your issue. When I saw the HTML data, unfortunately, I couldn't find the difference of HTML between the showing rows and the not showing rows. And also, I could confirm that the complete table is included in the HTML data. By the way, when I tested it using =IMPORTXML(A1,"//section[2]//tr"), the same result of IMPORTHTML occurs. So I thought that in this case, IMPORTHTML and IMPORTXML might not be able to retrieve the complete table.

So, in this answer, as a workaround, I would like to propose to put the complete table parsed using Sheets API. In this case, Google Apps Script is used. By this, I could confirm that the complete table can be retrieved by parsing the HTML table with Sheet API.

Sample script:

Please copy and paste the following script to the script editor of Spreadsheet, and please enable Sheets API at Advanced Google services. And, please run the function of myFunction at the script editor. By this, the retrieved table is put to the sheet of sheetName.

function myFunction() {
  // Please set the following variables.
  const url ="https://finance.yahoo.com/quote/TCOM/options?date=1610668800";
  const sheetName = "Sheet1";  // Please set the destination sheet name.
  const sessionNumber = 2;  // Please set the number of session. In this case, the table of 2nd session is retrieved.

  const html = UrlFetchApp.fetch(url).getContentText();
  const section = [...html.matchAll(/<section[\s\S\w]+?<\/section>/g)];
  if (section.length >= sessionNumber) {
    if (section[sessionNumber].length == 1) {
      const table = section[sessionNumber][0].match(/<table[\s\S\w]+?<\/table>/);
      if (table) {
        const ss = SpreadsheetApp.getActiveSpreadsheet();
        const body = {requests: [{pasteData: {html: true, data: table[0], coordinate: {sheetId: ss.getSheetByName(sheetName).getSheetId()}}}]};
        Sheets.Spreadsheets.batchUpdate(body, ss.getId());
      }
    } else {
      throw new Error("No table.");
    }
  } else {
    throw new Error("No table.");
  }
}
  • const sessionNumber = 2; means that 2 of =IMPORTHTML("https://finance.yahoo.com/quote/TCOM/options?date=1610668800","table",2).

References:

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • Great solution, it works! Marked as accepted :) Question: since you were so quick to come up with this, do you have any tricks up your sleeve that might work on https://seekingalpha.com/symbol/EBAY/growth (which fails to scrape entirely)? :) – J23 Dec 22 '20 at 04:31
  • @Metal450 Thank you for replying. I'm glad your issue was resolved. About your new question, I would like to support you. But I have to apologize for my poor English skill. Unfortunately, I cannot understand about your new question. But when I could correctly understand about it, I would like to think of the solution. I deeply apologize that I cannot replicate your new question soon. I would like to study more and more. – Tanaike Dec 22 '20 at 04:55
  • 英語はもう上ですよ!私の日本語より ;) I meant that I cannot import any values from Seekingalpha.com into Google Sheets - probably because they are populating their page with javascript. You can choose any value on seekingalpha.com, but find that IMPORTHTML fail to retrieve it. – J23 Dec 22 '20 at 19:06
  • @Metal450 Thank you for replying. When I could correctly understand about your new question and find the solution, I would like to tell you. I deeply apologize that I cannot resolve your new question soon. This is due to my poor skill. I deeply apologize for this again. – Tanaike Dec 22 '20 at 23:09
  • No problem...if you come up with anything, please let me know! I'd also be happy to post it as a separate question so you can give an "official" answer & receive credit :) – J23 Dec 23 '20 at 02:04