19

First of all, I am not looking for any help in development or testing environment. Also I am new to phantomjs and all I want is just the command line operation of phantomjs on linux terminal.

I have an html page whose body is rendered by some javascript code. What I need is I wanted to download that rendered html content using phantomjs.

I don't have any idea using phantomjs. I have a bit of experience in shell scripting. So I have tried to do this with curl. But as curl is not sufficient to render javascript, I was able to get the html of the default source code only. The rendered contents weren't downloaded. I heard that ruby mechanize may do this job. But I have no knowledge about ruby. So on further investigation I found the command line tool phantomjs. How can I do this with phantomjs?

Please feel free to ask what all additional information do I need to provide.

Anonymous Platypus
  • 1,242
  • 4
  • 18
  • 47
  • Sharing your research helps everyone. Tell us what you've tried and why it didn't meet your needs. This demonstrates that you've taken the time to try to help yourself, it saves us from reiterating obvious answers, and most of all it helps you get a more specific and relevant answer! Also see [how to ask](http://stackoverflow.com/questions/how-to-ask) – Cerbrus Jan 29 '15 at 07:54
  • I have updated my question with the researches I have made. – Anonymous Platypus Jan 29 '15 at 08:17
  • Are you using phantomjs only for downloading html content or trying to download it as an image? For generating image check http://phantomjs.org/screen-capture.html – jsjunkie Jan 29 '15 at 08:22
  • I just wanted to get the html content of a page. – Anonymous Platypus Jan 29 '15 at 09:02
  • 1
    possible duplicate of [How to print html source to console with phantomjs](http://stackoverflow.com/questions/12450868/how-to-print-html-source-to-console-with-phantomjs) – Artjom B. Jan 29 '15 at 09:43

2 Answers2

23

Unfortunately, that is not possible using just the PhantomJS command line. You have to use a Javascript file to actually accomplish anything with PhantomJS.

Here is a very simple version of the script you can use

Code mostly copied from https://stackoverflow.com/a/12469284/4499924

printSource.js

var system = require('system');
var page   = require('webpage').create();
// system.args[0] is the filename, so system.args[1] is the first real argument
var url    = system.args[1];
// render the page, and run the callback function
page.open(url, function () {
  // page.content is the source
  console.log(page.content);
  // need to call phantom.exit() to prevent from hanging
  phantom.exit();
});

To print the page source to standard out.

phantomjs printSource.js http://todomvc.com/examples/emberjs/

To save the page source in a file

phantomjs printSource.js http://todomvc.com/examples/emberjs/ > ember.html

Community
  • 1
  • 1
Daniel Ma
  • 646
  • 5
  • 10
  • I hope this would answer my question. I think I might need to use the script that is loading in my target application instead of this. – Anonymous Platypus Jan 29 '15 at 09:05
  • 1
    I guess that would work for this specific instance, but the solution I gave you will work for any page – Daniel Ma Jan 29 '15 at 09:06
  • In my experience you need to also add a time delay to give the page a channce to fully render by wrapping the main body inside `page.open` with `setTimeout(function() { }, delay);` where `delay = 5000` milliseconds, for example. Yoou could make the delay an optional param with `system.ags[2]` and you can chec the args length with `system.args.length`. – Leonid Apr 30 '20 at 19:30
0
var pagehtml = page.evaluate("function() {"+ 
  "return '<html><head>' + document.head.innerHTML + '</head>' + '<body>' + document.body.innerHTML + '</body></html>';" + 
"}");


fs.write('output.html',pagehtml,'w');