3

I’m trying to fetch a table from a site that needs to be rendered. That causes my fetched data to be incomplete. The body is empty as the scripts hasn't been run yet I guess.

Initially I wanted to fetch everything in the browser but I can’t do that since the CORS header isn't set and I don’t have access to the server.

Then I tried a server approach using node.js together with node-fetch and JSDom. I read the documentation and found the option {pretendToBeVisual: true } but that didn't change anything. I have a simple code posted below:

const fetch = require('node-fetch');
const jsdom = require("jsdom");
const { JSDOM } = jsdom;

let tableHTML = fetch('https://www.travsport.se/uppfodare/visa/200336/starter')
.then(res => res.text())
    .then(body => {
      console.log(body)
      const dom = new JSDOM(body, {pretendToBeVisual: true })
      return dom.window.document.querySelector('.sportinfo_tab table').innerHTML
    })
    .then(table => console.log(table))

I expect the output to be the html of the table but as of now I only get the metadata and scripts in the response making the code crash when extracting innerHTML.

Heretic Monkey
  • 11,687
  • 7
  • 53
  • 122
Patrick Bender
  • 407
  • 4
  • 16
  • The functionality you are trying to achieve is called web page crawling. Node fetch just fetches page, but it does not render it like browser does. You can try this https://www.npmjs.com/package/crawler module, but I am not sure if it works with SPA, – Ninad Aug 05 '19 at 12:20
  • 1
    You'll need to some tool like puppeteer, phamtomjs or selenium to render the page. You are only receiving html, not what browser shows you – Doğancan Arabacı Aug 05 '19 at 12:21

1 Answers1

1

Why not use google-chrome headless ?

I think the site you quote does not work for --dump-dom, but you can activate --remote-debugging-port=9222 and do whatever you want like said in https://developers.google.com/web/updates/2017/04/headless-chrome

Another useful reference: How can I dump the entire Web DOM in its current state in Chrome?

ton
  • 3,827
  • 1
  • 42
  • 40