1

enter image description here

I'm trying to scrape the header and footer of a table with node and cheerio. The entire table html is below , and my code is:

async function getTableFooter(html) {

  const $ = cheerio.load(html, null, false);

  let myArr = [];
  const headerRow = $('thead > tr');
  const footerRow = $("tfoot > tr");
  const headThs = [...$(headerRow).find('th')];
  const footThs = [...$(footerRow).find('th')];
  for (let i = 0; i < headThs.length; i++) {
    try {
      const headTh = headThs[i];
      console.log($(headTh).text());
      const footTh = footThs[i];
      console.log($(footTh).text());
      console.log('---------');
      // myArr.push({headTh:footTh});

    } catch (error) {
      console.log(error);
    }

  }

As I step through the code I see that there are 10 ths for both header and footer. When I try to print it out, the header inner text prints out as expected, but not the footer fields. Why not?

<table class="table table-striped table-condensed mt-0 mb-0 p-0 dataTable dtr-inline" width="100%" style="color: black; width: 100%;" id="tblAcctBal" role="grid">
                                    <thead>
                                    <tr class="text-center" role="row"><th class="sorting_disabled" rowspan="1" colspan="1" style="width: 26px;">PAY</th><th class="sorting_disabled" rowspan="1" colspan="1" style="width: 63px;">TAX YEAR</th><th class="sorting_disabled" rowspan="1" colspan="1" style="width: 68px;"><a href="#" role="button" class="btn popovers font-weight-bold pop p-0" data-toggle="popover" data-html="true" data-placement="top" data-trigger="focus" style="font-size:12px !important;" data-content="A Certificate Number is the number given when a Lien is purchased on the delinquent taxes of a property.
                                               <b>Certificates must be redeemed in full</b>, and until redeemed, interest will accrue monthly at the percent the Lien Holder was awarded.
                                               <br/>See Arizona Revised Statute <a href='https://www.azleg.gov/viewdocument/?docName=https://www.azleg.gov/ars/42/18104.htm' target='_blank'>42-18104</a> for further information." title="" data-original-title="<b>What is a Certificate?</b>"><u>CERT NO  <i class="fa fa-info-circle text-primary"></i></u></a></th><th class="sorting_disabled" rowspan="1" colspan="1" style="width: 98px;">INTEREST DATE</th><th class="sorting_disabled" rowspan="1" colspan="1" style="width: 61px;">INTEREST<br>PERCENT</th><th class="sorting_disabled" rowspan="1" colspan="1" style="width: 60px;">AMOUNT</th><th class="sorting_disabled" rowspan="1" colspan="1" style="width: 61px;">INTEREST</th><th class="sorting_disabled" rowspan="1" colspan="1" style="width: 34px;">FEES</th><th class="sorting_disabled" rowspan="1" colspan="1" style="width: 70px;">PENALTIES</th><th class="sorting_disabled" rowspan="1" colspan="1" style="width: 73px;">TOTAL DUE</th></tr>
                                    </thead>
                                    <tbody>
                                    
                                    <tr class="odd"><td valign="top" colspan="10" class="dataTables_empty">No data available in table</td></tr></tbody>
                                    <tfoot>
                                        <tr><th rowspan="1" colspan="1"></th><th rowspan="1" colspan="1"></th><th rowspan="1" colspan="1"></th><th rowspan="1" colspan="1"></th><th rowspan="1" colspan="1"></th><th rowspan="1" colspan="1">$0.00</th><th rowspan="1" colspan="1">$0.00</th><th rowspan="1" colspan="1">$0.00</th><th rowspan="1" colspan="1">$0.00</th><th rowspan="1" colspan="1">$0.00</th></tr>
                                    </tfoot>
                                </table>

edit: output looks like:

enter image description here

Edit2:

I've rewritten the code as a minimal example (fetch refers to node-fetch)

        const cheerio = require("cheerio");
        const fetch = require('node-fetch');
        const r = await fetch("https://www.to.pima.gov/propertyInquiry/?stateCodeB=129&stateCodeM=05&stateCodeP=0070");
        const body = await r.text()
       
        const outerHTML = cheerio.load(body);
        const innerHTML = outerHTML('html').html();
        const $ = cheerio.load('<html>' + innerHTML + '</html>', null, false);
        const o = $.html();
        const headTHS = $('#tblAcctBal > thead > tr > th');
        const footTHS = $('#tblAcctBal > tfoot > tr > th');
        for (let i = 0; i < headTHS.length; i++) {
            try {
                const headTh = headTHS[i];
                console.log($(headTh).text());
                const footTh = footTHS[i];
                console.log($(footTh).text());
                console.log('---------');
                // myArr.push({headTh:footTh});

            } catch (error) {
                console.log(error);
            }
        }

The html contains the appropriate table including:

enter image description here

but I still do not get the footer fields, The console output looks like the first edit above . @ggorlen , you were right. I initially took the html from devtools to make things tidier!

user1592380
  • 34,265
  • 92
  • 284
  • 515
  • Does nothing get printed out or does `undefined` get printed? Could the code have errored instead of printing out? – code Feb 20 '23 at 22:18
  • Your code works fine for me--those first 5 footer fields have empty text in the markup and the screenshot, as with your code and output, so everything looks good. What output are you expecting? – ggorlen Feb 20 '23 at 23:34
  • @ggorlen,@code I was expecting the first 5 fields to be empty, but not the last 5 - please see edit output – user1592380 Feb 21 '23 at 00:21
  • 1
    Cannot replicate. Those 5 last header ths appear on the console output as expected. How are you loading the html? – Kostas Minaidis Feb 21 '23 at 00:44
  • 1
    Also, how do you get the output that you have shared ("print it out")? – Kostas Minaidis Feb 21 '23 at 00:45
  • The problem is probably your fetch call, which is returning static HTML. I suspect you may have taken the HTML from devtools after JS ran, rather than from `console.log($.html())`, which shows the actual static HTML cheerio is working with that doesn't include any JS-injected data. If you can share the actual site, it's easy to diagnose. – ggorlen Feb 21 '23 at 03:06
  • @ggorlen, please see latest edit. – user1592380 Feb 21 '23 at 18:08
  • @KostasMinaidis, The output is in debug console as I step through code. – user1592380 Feb 21 '23 at 18:09
  • As mentioned, if you `view-source:` or `console.log(body)` you'll see all the `tfoot` elements are empty. The data is injected with JS. Consider using Puppeteer. – ggorlen Feb 21 '23 at 18:19
  • @ggorlen- Thank you, I've been looking through the js files, wondering if there is an api. How is the data typically made available to js to populate the footer? – user1592380 Feb 22 '23 at 02:45
  • 1
    Also FYI fetch is included in newer versions of node – pguardiario Feb 22 '23 at 03:02

1 Answers1

1

It looks like that site is totaling them in js, with cheerio that would look like (untested)

for(let table of $('table').get()){
  for(let i of $(table).find('tfoot th')){
    let values = $(table).find(`tbody td:nth-child(${i + 1})`).text().replace(/[$,]/g, '')
    let total = values.reduce((a, b) => Number(a) + Number(b), 0)
    $(table).find(`tfoot th:nth-child(${i + 1})`).text(total)
  }
}
pguardiario
  • 53,827
  • 19
  • 119
  • 159