2

So I'm doing some screen scraping on a site that is very JS heavy. It uses a client side templating engine that renders all the content. I tried using jQuery and that worked in the console, but not on the server (Nodejs), obviously.

I looked at a few libraries for Python and Java, and they seem to be able to handle what I want, but I would prefer a JS solution that works with a Node server.

Is there any way to get the complete source of a page after it's rendered, using Node?

KGo
  • 18,536
  • 11
  • 31
  • 47

3 Answers3

2

I personally love PhantomJS or Selenium, which do exactly that.

The docs/examples should work pretty much out of the box.

AstroCB
  • 12,337
  • 20
  • 57
  • 73
JZL003
  • 426
  • 5
  • 16
1

if you want to use a nodejs module then you might be interested in this:

https://github.com/sgentle/phantomjs-node

or this:

https://github.com/alexscheelmeyer/node-phantom

azero0
  • 2,220
  • 3
  • 20
  • 31
1

I used jsdom for screen scrapping and the code goes here...

var jsdom = require( 'jsdom' );
jsdom.env( {
url: <give_url_of_page_u_want_to_scarpe>,
scripts: [ "http://code.jquery.com/jquery.js" ],
done: function( error, window ) {
  var $ = window.$;

  // required page is loaded in $....
  //you can write any javascript or jquery code get what ever you want

}
} );
Srinath Mandava
  • 3,384
  • 2
  • 24
  • 37