1

Google seems to be failing me today: I'm looking for a way to load a remote html page into my Java application. This HTML page contains some JavaScript that generates most of the content. Now I thought it would be fairly straightforward to open the page in Java and have a look at the HTML.

When I use URL.openStream() to read the file, I get the HTML source with JavaScript and without the generated HTML (which is what I would expect). So how do i get from this to the HTML source including the generated content? I thought it would be fairly straightforward, but after a few hours on Google, I get completely entangled in Rhino, EnvJs, Jsoup, but it's not really getting me anywhere.

Does anyone have any suggestions?

Geert
  • 355
  • 1
  • 3
  • 7
  • 1
    This might not be the best solution. But when you put the HTML in a webview the Javascript code will be executed. So you can pull it from the webview again. – Klaasvaak Oct 23 '12 at 12:20
  • You need to execute the JS first with some JS-engine to gather its output. – feeela Oct 23 '12 at 12:51

1 Answers1

2

Yes, basically there is no easy solution, as you need to actually render the page, so you need a javascript engine (as feeela says).

One solution is to use webkit. I haven't used it in Java, but in Python. You may look at WebKit browser in Java app on multiple platforms

Community
  • 1
  • 1
Pixou
  • 1,719
  • 13
  • 23