0

I have a script to scrape Yahoo Historical Data but it looks like the decrypt service stopped working.

function Scrapeyahoo(symbol) {
  //modificación del 27/1/23 hecha por Tanaike 
  // https://stackoverflow.com/questions/75250562/google-apps-script-stopped-scraping-data-from-yahoo-finance/75253348#75253348

  const s = encodeURI(symbol);
  const url = 'https://finance.yahoo.com/quote/' +s +'/history?p=' +s;

  var html = UrlFetchApp.fetch(url).getContentText().match(/root.App.main = ([\s\S\w]+?);\n/);
  if (!html || html.length == 1) return;
  var obj = JSON.parse(html[1].trim());
  var key = [...new Map(Object.entries(obj).filter(([k]) => !["context", "plugins"].includes(k)).splice(-4)).values()].join("");
  if (!key) return;
  const cdnjs = "https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.1.1/crypto-js.min.js";
  eval(UrlFetchApp.fetch(cdnjs).getContentText());
  const obj1 = JSON.parse(CryptoJS.enc.Utf8.stringify(CryptoJS.AES.decrypt(obj.context.dispatcher.stores, key)));
  const header = ["date", "open", "high", "low", "close", "adjclose", "volume"];
  const ar = obj1.HistoricalPriceStore.prices.map(o => header.map(h => h == "date" ? new Date(o[h] * 1000) : (o[h] || "")));
  
  return ar
}

I get the error Malformed UTF-8 data stringify in the line

JSON.parse(CryptoJS.enc.Utf8.stringify(CryptoJS.AES.decrypt(obj.context.dispatcher.stores, key)));

A few weeks ago @Tanaike solved here a similar issue, but it looks like there has been new changes.

I ask for help with this problem. Thanks in advance.

Tanaike
  • 181,128
  • 11
  • 97
  • 165
ejooroo
  • 39
  • 4

1 Answers1

1

It seems that the specification for retrieving the key has been changed. In this case, vvar key = [...new Map(Object.entries(obj).filter(([k]) => !["context", "plugins"].includes(k)).splice(-4)).values()].join(""); doesn't return the correct key. And also, it seems that the logic for retrieving the valid key has been changed. But, unfortunately, I cannot still find the correct logic. So, in this answer, I would like to refer to this thread. In this thread, the valid keys are listed in a text file. When this is reflected in your script, it becomes as follows.

Modified script:

function Scrapeyahoo(symbol) {
  const s = encodeURI(symbol);
  const url = 'https://finance.yahoo.com/quote/' + s + '/history?p=' + s;

  var html = UrlFetchApp.fetch(url).getContentText().match(/root.App.main = ([\s\S\w]+?);\n/);
  if (!html || html.length == 1) return;
  var obj = JSON.parse(html[1].trim());
  const cdnjs = "https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.1.1/crypto-js.min.js";
  eval(UrlFetchApp.fetch(cdnjs).getContentText());

  const keyFile = "https://github.com/ranaroussi/yfinance/raw/main/yfinance/scrapers/yahoo-keys.txt";
  const res = UrlFetchApp.fetch(keyFile);
  const keys = res.getContentText().split("\n").filter(String);
  let obj1 = keys.reduce((ar, key) => {
    try {
      const o = JSON.parse(CryptoJS.enc.Utf8.stringify(CryptoJS.AES.decrypt(obj.context.dispatcher.stores, key.trim())));
      ar.push(o);
    } catch (e) {
      // console.log(e.message)
    }
    return ar;
  }, []);
  if (obj1.length == 0) {
    throw new Error("Specification at the server side might be changed. Please check it.");
  }
  obj1 = obj1[0];

  const header = ["date", "open", "high", "low", "close", "adjclose", "volume"];
  const ar = obj1.HistoricalPriceStore.prices.map(o => header.map(h => h == "date" ? new Date(o[h] * 1000) : (o[h] || "")));
  return ar
}
  • When I tested this script with a sample value of CL=F as symbol, I confirmed that the script worked.

Note:

  • In this sample, in order to load crypto-js, eval(UrlFetchApp.fetch(cdnjs).getContentText()) is used. But, if you don’t want to use it, you can also use this script by copying and pasting the script of https://cdnjs.cloudflare.com/ajax/libs/crypto-js/4.1.1/crypto-js.min.js to the script editor of Google Apps Script. By this, the process cost can be reduced.

  • I can confirm that this method can be used for the current situation (February 15, 2023). But, when the specification in the data and HTML is changed in the future update on the server side, this script might not be able to be used. Please be careful about this.

Reference:

Tanaike
  • 181,128
  • 11
  • 97
  • 165