4

I have an HTML page that has Javascript code. It needs to be rendered first before it can be converted into an image.

I am aware of projects like wkhtmltoimage, PhantomJS, khtmltopng, webkit2png, PrinceXML and html2image. I have implemented a few of those but I am trying to find a pure Java solution that does not have to use Process to execute a command. Any help would be great, thanks!

edit: I looked into Cobra however it seems that the JS support is still in dev and it does not parse my html file properly.

Or if there are any other ways of doing this, please let me know. I am just trying to find the best solution possible.

KrispyDonuts
  • 1,262
  • 2
  • 18
  • 37
  • 1
    A pure **Java** solution or a pure **JavaScript** solution? They're (way) not the same thing. – Pointy Jun 19 '12 at 18:51
  • Pure Java solution however it needs to be able to take in HTML + JS – KrispyDonuts Jun 19 '12 at 18:53
  • Ah OK. Well, are you talking about something that can handler arbitrary HTML pages with JavaScript code in them? If so, I wouldn't get your hopes up. The FlyingSaucer project does an amazingly good job with XHTML and CSS, but it doesn't handle JavaScript. – Pointy Jun 19 '12 at 18:55
  • Yeah thats what I was looking for. Hmm I remember coming across that project, but yes unfortunately it doesn't handle JavaScript. – KrispyDonuts Jun 19 '12 at 18:59
  • The problem is that to do that properly requires all the functionality of a full-blown web browser. That's why so many tools for the purpose use Java to "drive" Firefox or WebKit as a component. – Pointy Jun 19 '12 at 19:00
  • Hmm, I see. I'm trying to render and export the image all on the server. Having said that, I am guessing my best bet is to use the Process class to run wkhtmltoimage? – KrispyDonuts Jun 19 '12 at 19:03
  • 1
    Well I don't personally have much experience doing that, but yes that sounds like the generally correct approach. – Pointy Jun 19 '12 at 19:05

2 Answers2

1

There is no pure Java solution - no one has written a browser in Java that supports HTML 5.

I'd try either of these approaches:

  1. Use env.js + rhino to simulate a browser in which you can run the JavaScript. That should give you a DOM which you can render using FlyingSaucer, for example.

  2. Add SWT to your classpath (plus the binary for your platform). It contains a Browser component that uses your system's browser to render URLs or an HTML string.

You probably need SWTBot to run the browser in headless mode.

If that doesn't work and you're on Linux, then you can start an in-memory X server Xvfb to open your browser. Or you can use vncserver to start a desktop on your server.

[EDIT] The project phantomjs might do what you want:

PhantomJS (www.phantomjs.org) is a headless WebKit scriptable with JavaScript or CoffeeScript.
[...]
Use cases: Headless web testing, Site scraping, Page rendering
Multiplatform, available on major operating systems: Windows, Mac OS X, Linux, other Unices
Fast and native implementation of web standards: DOM, CSS, JavaScript, Canvas, SVG. No emulation!
Pure headless (X11) on Linux, ideal for continuous integration systems. Also runs on Amazon EC2.

The quickstart page explains how to load a web page and render it to an image.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • Thanks for the suggestions Aaron! I'll try and implement these approaches. – KrispyDonuts Jun 20 '12 at 12:16
  • I have implemented env.js + rhino to run the javascript+html however I am having trouble actually linking FlyingSaucer and the DOM from env.js. I understand that I can pass a DOM Document to FlyingSaucer, however I am a little lost on how to actually get the DOM from env.js+rhino. If you have experience in env.js and rhino, any suggestions would be great, thanks! – KrispyDonuts Jun 20 '12 at 18:04
  • 1
    env.js just gives you the same environment that you'd have in a browser. See this answer how to get the HTML for the final page: http://stackoverflow.com/questions/817218/get-entire-document-html-as-string Just run the JavaScript from one of the solution in rhino and pass the result to FlyingSaucer to render. – Aaron Digulla Jun 20 '12 at 18:40
  • Thanks again for the help. So when getting the HTML of the rendered page, I would only get the HTML and not the JS. What if the javascript doesn't actually alter the HTML code but it is needed to render a chart or graph. What I mean is, I have 3-4 javascript functions that are needed to render a chart. It seems like this would not be possible in my case because I need the JS to render the chart. If it just returns the HTML then it would return the inline script to render the chart. I might be understanding this incorrectly though. – KrispyDonuts Jun 20 '12 at 18:48
  • 1
    After env.js returns, all the scripts on the page should have been executed and the DOM should be the same as in your browser. If that doesn't happen, the scripts in your page probably use animation or try to load data (which env.js prohibits by default to protect you). I suggest to write small examples to see how it works and post new questions as you hit problems. – Aaron Digulla Jun 21 '12 at 13:30
1

I have found a solution using WebRenderer. WebRenderer is a paid solution and has a swing, server, and desktop edition. The swing edition is the only one that supports HTML5 as of 7/9/2012. However, the swing edition can be used on a server to convert the image by instantiating the browser and not creating a JFrame. See this question.

Community
  • 1
  • 1
KrispyDonuts
  • 1,262
  • 2
  • 18
  • 37