37

I have this webpage that uses client-side JavaScript to format data on the page before it's displayed to the user.

Is it possible to somehow use wget to download the page and use some sort of client-side JavaScript engine to format the data as it would be displayed in a browser?

the
  • 21,007
  • 11
  • 68
  • 101
Jake Wilson
  • 88,616
  • 93
  • 252
  • 370

3 Answers3

28

You could probably make that happen with something like PhantomJS

You can write a phantomjs script that will load the page like a browser would, and then either take screenshots or use JS to inspect the page and pull out data.

areim
  • 3,371
  • 2
  • 23
  • 29
Alex Wayne
  • 178,991
  • 47
  • 309
  • 337
  • See [command-line-browser-with-js-support](http://superuser.com/questions/448514/command-line-browser-with-js-support) for phantomjs script to use. – lemonsqueeze Nov 08 '14 at 10:22
  • Beware of the dependencies when installing PhantomJS. You may as well be running headless Firefox. – mckenzm Apr 29 '20 at 05:52
9

Here is a simple little phantomjs script that triggers javascript on a webpage and allows you to pull it down locally:

file: get.js

var page = require('webpage').create(),
  system = require('system'), address;

address = system.args[1];
page.scrollPosition= { top: 4000, left: 0}  
page.open(address, function(status) {
  if (status !== 'success') {
    console.log('** Error loading url.');
  } else {
    console.log(page.content);
  }
  phantom.exit();
});

Use it as follows:
$> phantomjs /path/to/get.js "http://www.google.com" > "google.html"

Changing /path/to, url and filename to what you want.

  • Would you add code to deal with `document.cookie` and `location.href` and then fetch the new `href` ? – Galaxy Oct 13 '21 at 09:07
2

Not with wget, as I doubt it includes any form of a JavaScript engine. However, you could use WebKit to process the page, and thus the output.

Using things like this as a base for how to get the content: http://situated.wordpress.com/2008/06/04/take-screenshots-of-a-website-from-the-command-line/

drowe
  • 2,312
  • 18
  • 14