0

I am trying to use npm request and cheerio to pull webpages and parse their html. This works fine for cases where the html is loaded on request. But I am having an issue where the site loads a loading screen first and then updates the page with new info / elements after a few moments.

Partial code:

var url = 'website with loading screen prior to content.com';
var request = require('request');
request(url, function (error, response, body) {
  if (!error && response.statusCode == 200) {
    console.log(body) // Show the HTML for the Google homepage.
  }
})

What I would like - Either request having the ability to wait for a specific element to show up on the page and then read the body. OR be able to wait a fixed number of seconds and then read the body

Other options - It might not be possible with npm request, which is fine. If that is the case could you please point me in the correct direction. My other options that I am considering are using webdriver.io or phantomjs. Is there a recommended course of action for this?

alex_milhouse
  • 891
  • 1
  • 13
  • 31

1 Answers1

0

Unfortunately, there is no way to configure request to "wait" after the request has been initiated before obtaining a response. The best thing for you to do is to check out PhantomJS. It is a headless browser that you can use to load and render the page and then access dynamically generated content via javascript.

Check out this answer for a brief example.

Community
  • 1
  • 1
jordanwillis
  • 10,449
  • 1
  • 37
  • 42
  • Thanks, I had listed phantomjs as a possible option in my question. I actually ended up using it with some great success. – alex_milhouse Mar 03 '17 at 06:52