2

I can read the HTML contents via http (for example, http://www.foo.com) using Java (with URL and BufferedReader classes). However, a couple of them contain JavaScript. My current app cannot process JavaScript.

What's the best way to read HTML content with JavaScript using Java?

I am open using other languages if it is easier.

Thanks in advance for your help.

UPDATE - Clarification:

A couple HTML contents are generated dynamically using JavaScript. I can see the result (in pure HTML after the JavaScript processing) when viewing them on a browser.

On the other hand, when my Java app retrieves the HTML contents, it says that there is no JavaScript on my app.

Ideally, I want to be able to get the same result as on the browser using my Java app.

Thanks for everyone's response.

pion
  • 3,593
  • 6
  • 29
  • 41
  • 2
    What are you doing with the contents returned by a URL after reading from it? Are you evaluating the JavaScript? – Binil Thomas May 23 '11 at 19:51
  • Not sure that I correctly understand your question. You can use [SWT Browser widget](http://www.eclipse.org/articles/Article-SWT-browser-widget/browser.html). It can render HTML and supports JavaScript. – George Suaridze May 23 '11 at 19:53
  • @pion What do you mean by "it says that there is no JavaScript on my app". Who says that ? Is it the HTML you got which contains this exact text ? Then you should consider modifying your User Agent String to get the correct content first. HTMLUnit can help you for that and is definitely the best way to go. – Grooveek Jan 04 '12 at 09:42
  • I have same proplem , can u help me ? http://stackoverflow.com/questions/20781322/java-program-to-read-a-html-page-and-save-its-content-use-javascript?noredirect=1#comment31149974_20781322 – ducngm.hn Dec 26 '13 at 14:25
  • @pion did my answer helped? – quarks Feb 13 '16 at 14:36

4 Answers4

2

HtmlUnit has good JavaScript support and it should (almost) parse the HTML as a web browser.

Aravindan R
  • 3,084
  • 1
  • 28
  • 44
0

Cobra (http://lobobrowser.org/cobra/getting-started.jsp) will fit your needs

Shan
  • 521
  • 2
  • 8
  • 28
0

For just HTML parsing you can use HTMLParser (org.htmlparser). However from the way you described your problem, it seems you need a browser, because executing is totally different than just parsing. Cheers.

quarks
  • 33,478
  • 73
  • 290
  • 513
-3

With no doubt you need to use Java html parser:

Community
  • 1
  • 1
r.piesnikowski
  • 2,911
  • 1
  • 26
  • 32
  • 99 of 100 HTML parsers in Java don't understand/execute JavaScript. Please be more specific. – BalusC May 23 '11 at 20:42