8

I know you may think this question is stupid, but I need to use HtmlUnit. However, it returns a page either as XML or as text.

I don't how to get the pure HTML (the same as the source code that browsers return)

I need this, because I need to use some written modules. Any ideas?

Chris Forrence
  • 10,042
  • 11
  • 48
  • 64
Afshin Moazami
  • 2,092
  • 5
  • 33
  • 55
  • mr. Vai asks if you can "provide fullcode which extracts webpage using HTMLUNIT" – John Dvorak Feb 17 '13 at 18:33
  • I have save problem , Can u help me ? http://stackoverflow.com/questions/20781322/java-program-to-read-a-html-page-and-save-its-content-use-javascript – ducngm.hn Dec 26 '13 at 10:52

1 Answers1

25

You can use the following piece of code to achieve your goal:

WebClient webClient = new WebClient();
Page page = webClient.getPage("http://example.com");
WebResponse response = page.getWebResponse();
String content = response.getContentAsString();

See javadocs of the WebResponse.html#getContentAsString() method.

Ahmed Ashour
  • 5,179
  • 10
  • 35
  • 56
Dmytro Chyzhykov
  • 1,814
  • 1
  • 20
  • 17