0

Given a html file that has <script> tags which update the DOM on the fly, is it possible to run this file through some java code to get the dynamic DOM generated by java script.
I.e If I were to launch the file in a browser, browser would load the script tag and run java script and update DOM accordingly and render it. I want this final DOM but without a browser support.

My application is Java app running in Tomcat which fetches the html content from a DB. I need to expand this html content to get the "dynamic" content after applying the java script. Is this feasible through Java APIs?

EDIT I tried HtmlUnit to load the content and parse the page content. However MathJax expanded content is not available when I fetch xml content from HtmlPage.

URL url = new URL("http://www.example.com");
StringWebResponse response = new StringWebResponse(getPageHTMLContentFromDB(), url);
WebClient client = new WebClient();
HtmlPage page = HTMLParser.parseHtml(response, client.getCurrentWindow());
System.out.println(page.asXml()); // this line does not print the MathJax expanded DOM. 

When I launch the html content obtained from DB in a browser, I see the correct DOM (updated by MathJax).

suman j
  • 6,710
  • 11
  • 58
  • 109
  • @evanwong Please see my edit. I am not trying to run Javascript code alone on server side. This is not dup of the link suggested. – suman j Oct 02 '14 at 20:26
  • Why do you use JavaScript for this? Unless I misunderstood your goal, you could use a template engine like Freemarker or Java Server Pages (I'd opt for Freemarker). I'm not sure I see why you would pre-run JS on the server. You could either get away completely without JS or employ AJAX for retrieving data from the server and then manipulate the DOM on the client side as normal. What am I missing? - EDIT: By the way you might want to use an HTML parser and Rhino in conjunction... – Powerslave Oct 02 '14 at 20:35
  • I have content in a html page which is processed by MathJax JS library and yeilds the actual DOM content. This DOM is dynamic is not available unless the DOM content is processed by MathJax. Thats the reason Javascript is in question. – suman j Oct 02 '14 at 20:46
  • What are you trying to achieve? I mean, I don't see what's wrong with running MathJax on the client side. What do you need the resulting DOM for? – Powerslave Oct 02 '14 at 20:56
  • @Powerslave I need the resulting DOM for reporting purpose. To generate pdfs, images etc. As of now, its pdf I am trying to generate using the XHTML content. But pdf shows "not-expanded" DOM to me. So I want to process the HTML on server side to simulate Browser like Java script execution to yeild expanded DOM. `flyingsaucer` can create pdf using XHTML as long it as has all css. flyingsaucer does not have the capability to run java script. So I am pre-processing it before sending it to flyingsaucer. – suman j Oct 02 '14 at 21:04
  • I see. I'm not sure why your source data is HTML + JavaScript, but if you'd like to go that way, I'd suggest using **[Rhino](https://developer.mozilla.org/en-US/docs/Mozilla/Projects/Rhino)**, or something like `WebView` from JavaFX. **[Here](http://stackoverflow.com/questions/19420753/how-to-call-a-javascript-function-from-a-javafx-webview-on-button-click)** are some hints for the latter. Please note that since you definitely want browser behavior, you won't be able to get away without using a browser API, unless you implement an embedded browser engine yourself. – Powerslave Oct 02 '14 at 21:15
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/62373/discussion-between-jack-and-powerslave). – suman j Oct 02 '14 at 21:18

1 Answers1

0

I think you forgot to call WebClientOptions#setJavaScriptEnabled, so MathJax does not run at all. At least from you above code it seems so.

Try this on line 4:

client.getOptions().setJavaScriptEnabled(true);

I hope this helps.

Maybe WebClientOptions#setThrowExceptionOnScriptError is another method of importance here as without enabling this option you might not be notified of any problems during script execution (actually, the default setting could be the other way around as well - I'm not sure)

Powerslave
  • 1,408
  • 15
  • 16
  • Javascript is already enabled. I tried with other setThrowException method as well. No use. Simple JS created dynamic content is available from HtmlUnit. Its just that MathJax created DOM is not available. Looks like MathJax is asynchronous and I need to figure out a way to wait till MathJax finishes with the page. I tried waitForBackgroundJavaScript method as well. didnt help – suman j Oct 02 '14 at 21:51
  • Oh! Take a look at this: [http://htmlunit.sourceforge.net/faq.html#AJAXDoesNotWork](http://htmlunit.sourceforge.net/faq.html#AJAXDoesNotWork). Since - according to my other guess - it is asynchronous, you need to tell HtmlUnit to wait for the background JS operation. Please let me know how it works out. – Powerslave Oct 02 '14 at 22:05