1

I'm trying to get PhantomJS to take an html string and then have it render the full page as a browser would (including execution of any javascript in the page source). I need the resulting html result as a string. I have seen examples of page.open which is of no use since I already have the page source in my database.

Do I need to use page.open to trigger the javascript rendering engine in PhantomJS? Is there anyway to do this all in memory (ie.. without page.open making a request or reading/writing html source from/to disk?

I have seen a similar question and answer here but it doesn't quite solve my issue. After running the code below, nothing I do seems to render the javascript in the html source string.

var page = require('webpage').create();
page.setContent('raw html and javascript in this string', 'http://whatever.com');
//everything i've tried from here on doesn't execute the javascript in the string

--------------Update---------------

Tried the following based on the suggestion below but this still does not work. Just returns the raw source that I supplied with no javascript rendered.

var page = require('webpage').create();
page.settings.localToRemoteUrlAccessEnabled = true;
page.settings.webSecurityEnabled = false;
page.onLoadFinished = function(){
    var resultingHtml = page.evaluate(function() {
        return document.documentElement.innerHTML;
    });
    console.log(resultingHtml);
    //console.log(page.content); // this didn't work either
    phantom.exit();
};
page.url = input.Url;
page.content = input.RawHtml;
//page.setContent(input.RawHtml, input.Url); //this didn't work either
Community
  • 1
  • 1
sjdirect
  • 2,224
  • 2
  • 22
  • 27
  • Which PhantomJS version do you use? Please register to the `onConsoleMessage`, `onError`, `onResourceError`, `onResourceTimeout` events ([Example](https://gist.github.com/artjomb/4cf43d16ce50d8674fdf#file-1_phantomerrors-js)). Maybe there are errors. – Artjom B. Nov 09 '15 at 17:06

3 Answers3

3

The following works

page.onLoadFinished = function(){
    console.log(page.content); // rendered content
};
page.content = "your source html string";

But you have to keep in mind that if you set the page from a string, the domain will be about:blank. So if the html loads resources from other domains, then you should run PhantomJS with the --web-security=false --local-to-remote-url-access=true commandline options:

phantomjs --web-security=false --local-to-remote-url-access=true script.js

Additionally, you may need to wait for the completion of the JavaScript execution which might be not be finished when PhantomJS thought it finished. Use either setTimeout() to wait a static amount of time or waitFor() to wait for a specific condition on a page. More robust ways to wait for a full page are given in this question: phantomjs not waiting for “full” page load

Community
  • 1
  • 1
Artjom B.
  • 61,146
  • 24
  • 125
  • 222
  • Thanks for the reply. I updated my question above with code trying what you are suggesting. This still does not appear to solve my problem. Just returns the raw source that I supplied it with nothing rendered. – sjdirect Nov 09 '15 at 16:59
  • 1
    You haven't said anything about what your page is doing, so I provided a general answer. I also extended it a little now. – Artjom B. Nov 09 '15 at 17:11
0

The setTimeout made it work even though I'm not excited to wait a set amount of time for each page. The waitFor approach that is discussed here doesn't work since I have no idea what elements each page might have.

var system = require('system');
var page = require('webpage').create();
page.setContent(input.RawHtml, input.Url);
window.setTimeout(function () {
    console.log(page.content);
    phantom.exit();
}, input.WaitToRenderTimeInMilliseconds);
Community
  • 1
  • 1
sjdirect
  • 2,224
  • 2
  • 22
  • 27
  • I'll likely try something like checkin $.active in the future to see if the page has any pending ajax requests. Then i could avoid doing a setTimeout. – sjdirect Nov 09 '15 at 21:54
  • You can also use some of the suggestions from [here](http://stackoverflow.com/q/11340038/1816580) to wait for a full page load. – Artjom B. Nov 09 '15 at 22:57
0

Maybe not the answer you want, but using PhantomJsCloud.com you can do it easily, Here's an example: http://api.phantomjscloud.com/api/browser/v2/a-demo-key-with-low-quota-per-ip-address/?request={url:%22http://example.com%22,content:%22%3Ch1%3ENew%20Content!%3C/h1%3E%22,renderType:%22png%22,scripts:{domReady:[%22var%20hiDiv=document.createElement%28%27div%27%29;hiDiv.innerHTML=%27Hello%20World!%27;document.body.appendChild%28hiDiv%29;window._pjscMeta.scriptOutput={Goodbye:%27World%27};%22]},outputAsJson:false} The "New Content!" is the content that replaces the original content, and the "Hello World!" is placed in the page by a script.

If you want to do this via normal PhantomJs, you'll need to use the injectJs or includeJs functions, after the page content is loaded.

JasonS
  • 7,443
  • 5
  • 41
  • 61