1

Up to recently with the following codes in Javascript(Google Apps Script) I had been able to get data from https://www.census.gov/econ/currentdata/?programCode=VIP&startYear=2022&endYear=2022&categories[]=AXXXX&dataType=T&geoLevel=US&adjusted=1&notAdjusted=0&errorData=0. But all of sudden since sometime last month this codes doesn't work. I couldn't figure out what's wrong. Is there any change in Cheerio library? Can anyone help me? Thank you so much in advance for any help!

function test() {
  var url = "https://www.census.gov/econ/currentdata/?programCode=VIP&startYear=2022&endYear=2022&categories[]=AXXXX&dataType=T&geoLevel=US&adjusted=1&notAdjusted=0&errorData=0#table-results";
  var res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
  var $ = Cheerio.load(res); //version 13
  var data = $("table").find('td').toArray().map(el => $(el).text().replace(/,/g, ''));
  console.log(data);
}
ggorlen
  • 44,755
  • 7
  • 76
  • 106
Newbie
  • 247
  • 3
  • 11
  • It's not the Cheerio library's fault, and even if it did change (unlikely), you still presumably have a specific version in your package.json. Websites change over time, all the time and are under no obligation to stay the same for your convenience. This site no logner has a `` in the static HTML, as you can see if you view its source. It's requested by jQuery running on the page from the endpoint POST https://www.census.gov/econ_report/. You could use Puppeteer to extract the data.
    – ggorlen Dec 02 '22 at 07:17
  • Canonical: [How can I scrape pages with dynamic content using node.js?](https://stackoverflow.com/questions/28739098/how-can-i-scrape-pages-with-dynamic-content-using-node-js) – ggorlen Dec 02 '22 at 07:50
  • @ ggorlen, thanks a lot. I'm studying your proposal. BTW, in the HTML, I still see tag with Chrome Inspect. Is it because its class is empty, no
    tag? Sorry for my lack of understanding!
    – Newbie Dec 02 '22 at 08:03
  • No problem. The inspector dev tools shows the HTML _after_ JS has executed and injected the table. `view-source:` shows the static HTML that your HTTP request is returning which has no ``. Another way to see if the table is actually there or not is to `console.log(res)`. The class being empty doesn't matter because you're not selecting it that way. BTW, what is Cheerio 13? The closest I see to that is [0.13.0](https://www.npmjs.com/package/cheerio/v/0.13.0) but that's 9 years old. I use 1.0.0-rc.12.
    – ggorlen Dec 02 '22 at 08:10
  • Appreciate! I use Script ID: 1ReeQ6WO8kKNxoaA_O0XEQ589cIrRvEBA9qcWpNqdOP17i47u6N9M5Xh0 that I learned this site. When I click Cheerio In Google Apps Script, it popped up a window titled "Cheerio library settings". It shows Version 13. – Newbie Dec 02 '22 at 08:41
  • I'm not familiar with GAS, but thanks for that info. – ggorlen Dec 02 '22 at 14:53
  • I realize that Puppeteer doesn't work in Google Apps Script. I wonder if anyone can help out some solution within Google Apps Script. – Newbie Dec 03 '22 at 08:13
  • Probably a good idea to [tag:google-apps-script] to attract attention from subject matter experts. Maybe ask another question if you don't see anything after research. – ggorlen Dec 03 '22 at 08:14
  • I have to apologize for my poor English skill. Unfortunately, from your question, I cannot understand your expected value. So, can I ask you about the detail of your expected value from your showing URL? – Tanaike Dec 03 '22 at 08:20
  • @Tanaike, my expected values are the 2022's monthly data in the table in the bottom of the website. 1,726,585, 1,753,123, 1,768,168, ..... Thank you for asking! – Newbie Dec 03 '22 at 08:57
  • Thank you for replying. From your reply, I proposed an answer. Could you please confirm it? If that was not useful, I apologize. – Tanaike Dec 03 '22 at 11:54

1 Answers1

0

I believe your goal is as follows.

  • You want to retrieve the bottom table in the site of URL https://www.census.gov/econ/currentdata/?programCode=VIP&startYear=2022&endYear=2022&categories[]=AXXXX&dataType=T&geoLevel=US&adjusted=1&notAdjusted=0&errorData=0 using Google Apps Script.

Issue and workaround:

When I saw the HTML of your URL, the bottom table is not included. It seems that that is created by Javascript. But, unfortunately, I couldn't find the script. But, fortunately, when I saw the site, I can find the URL for downloading the table as CSV data. I thought that this URL might be able to be used. When this is reflected in a sample script, it becomes as follows.

Sample script:

function myFunction() {
  // This is from your URL.
  const url = "https://www.census.gov/econ/currentdata/?programCode=VIP&startYear=2022&endYear=2022&categories[]=AXXXX&dataType=T&geoLevel=US&adjusted=1&notAdjusted=0&errorData=0";

  // Convert your URL.
  const query = url.split("?").pop().split("&").reduce((o, e) => {
    const [k, v] = e.split("=");
    o[k == "programCode" ? "program" : k] = v;
    return o;
  }, {});
  const obj = { format: "csv", adjusted: true, notAdjusted: false, errorData: false, mode: "report", submit: "GET+DATA" };
  const q = Object.entries(obj).reduce((o, [k, v]) => (o[k] = v, o), query);
  String.prototype.addQuery = function (obj) { // Ref: https://gist.github.com/tanaikech/70503e0ea6998083fcb05c6d2a857107
    return this + "?" + Object.entries(obj).flatMap(([k, v]) => Array.isArray(v) ? v.map(e => `${k}=${encodeURIComponent(e)}`) : `${k}=${encodeURIComponent(v)}`).join("&");
  }
  const convertedUrl = "https://www.census.gov/econ_export".addQuery(q);

  // Download table as CSV data.
  const res = UrlFetchApp.fetch(convertedUrl);
  const ar = Utilities.parseCsv(res.getContentText());
  const idx = ar.findIndex(([a, b]) => !a && !b);
  const temp = ar.splice(idx + 1, ar.length);
  const result = temp[0].map((_, c) => temp.map(r => r[c]));
  console.log(result);
}
  • When this script is run, the following result is obtained.

      [
        ["Period","Jan-2022","Feb-2022","Mar-2022","Apr-2022","May-2022","Jun-2022","Jul-2022","Aug-2022","Sep-2022","Oct-2022","Nov-2022","Dec-2022"],
        ["Value","1726585","1753123","1768168","1780890","1793778","1803791","1817862","1797771","1800105","1794949","NA","NA"]
      ]
    
  • The URL of convertedUrl can be manually retrieved from the site. When you can use the manually retrieved URL, the script is simpler as follows.

      const res = UrlFetchApp.fetch("###URL###");
      const ar = Utilities.parseCsv(res.getContentText());
      const idx = ar.findIndex(([a, b]) => !a && !b);
      const temp = ar.splice(idx + 1, ar.length);
      const result = temp[0].map((_, c) => temp.map(r => r[c]));
      console.log(result);
    

IMPORTANT:

  • This sample script is for the current HTML of your URL of https://www.census.gov/econ/currentdata/?programCode=VIP&startYear=2022&endYear=2022&categories[]=AXXXX&dataType=T&geoLevel=US&adjusted=1&notAdjusted=0&errorData=0. When you change your URL, this script might not be able to be used. And, when the specification of the site is changed, this script might not be able to be used. Please be careful about this.

Reference:

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • @ Tanaike, your codes are working greatly. I learned a great deal of solutions and coding techniques not only this time but also in the past. I highly appreciate your help so many times. BTW, I wonder if you can take a look at [How to get a numerical data from trend lines on an interactive chart in website](https://stackoverflow.com/questions/74527202/how-to-get-a-numerical-data-from-trend-lines-on-an-interactive-chart-in-website?noredirect=1#comment131558431_74527202). – Newbie Dec 03 '22 at 14:48
  • @Newbie Thank you for replying and testing it. I'm glad your issue was resolved. Thank you, too. About your other question, I will check it. When I got an answer, I will post it. – Tanaike Dec 04 '22 at 00:39