How do I convert a document made in Jsoup (the Java html parser) into a string

Question

I have a document that was made in jsoup that looks like this

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();

How do i convert that doc into a string.

das_weezul · Accepted Answer · 2011-07-28T20:31:36.203

Have you tried:

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
String htmlString = doc.toString();

As Document extends Element it also has got the method html() which "Retrieves the element's inner HTML" according to the API. So that should work:

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
String htmlString = doc.html();

Additional Info:

Each Document object has got a reference to an instance of the inner class Document.OutputSettings which can be accessed via the method outputSettings() of Document. There you can enable/disable pretty-printing by using the setter prettyPrint(true/false). See the API for Document and Document.OutputSettings for furtherinformation

The first code block gave me `[Ljava.lang.String;@383534aa` instead of the html / content. BTW what if it's a `Document[]`? — Hack-R, Sep 12 '16 at 00:25

score 9 · Answer 2 · answered Jul 28 '11 at 20:20

9

doc.toString() works, as does doc.outerHtml().

answered Jul 28 '11 at 20:20

Jeremy Roman

16,137
1
43
44

3

`Document.toString()` internally calls `outerHtml()`. – Zaki Jun 16 '19 at 10:39

NomanJaved · Answer 3 · 2019-01-31T03:52:18.690

0

 Document doc = Jsoup.connect("http://en.wikipedia.org/").get();     
 Elements post = doc.select("div.post-content");
 String dd = post.toString();
 Document ddd = Jsoup.parse(dd);

After parsing the string to document then you can use on it document functions

 Elements scriptTag = ddd.getElementsByTag("script");
 System.out.println(scriptTag);

edited Jan 31 '19 at 03:52

answered Sep 03 '14 at 03:10

NomanJaved

1,324
17
32

How do I convert a document made in Jsoup (the Java html parser) into a string

3 Answers3

Linked