8

Im coding in Java..

Does anyone know how i can get the content of a javax.swing.text.html.HTMLDocument as a String? This is what i´ve got so far...

URL url = new URL( "http://www.test.com" );

HTMLEditorKit kit = new HTMLEditorKit(); 
HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument(); 
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
Reader HTMLReader = new InputStreamReader(url.openConnection().getInputStream()); 
kit.read(HTMLReader, doc, 0); 

I need the content of the HTMLDocument as a String.

Example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">    <html><head><meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1">

....... etc.

Any help would be appreciated. I need to use HTMLDocument class in order for the html to be processed correctly :)

Thanks Daniel

Zelleriation
  • 2,834
  • 1
  • 23
  • 23

2 Answers2

18
StringWriter writer = new StringWriter();
kit.write(writer, doc, 0, doc.getLength());
String s = writer.toString();
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
1

You don't need the editor and reader at all - just read the input stream. For example, with commons-io IOUtils.toString(inputStream)

or you can use:

Content content = document.getContent();
String str = content.getString(0, content.length() - 1);
Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
  • This won't work because the inherited [getContent](http://docs.oracle.com/javase/7/docs/api/javax/swing/text/AbstractDocument.html#getContent%28%29) method is protected. – Parker Aug 11 '15 at 17:28