119

What's the simplest way to get the String representation of a XML Document (org.w3c.dom.Document)? That is all nodes will be on a single line.

As an example, from

<root>
  <a>trge</a>
  <b>156</b>
</root>

(this is only a tree representation, in my code it's a org.w3c.dom.Document object, so I can't treat it as a String)

to

"<root> <a>trge</a> <b>156</b> </root>"

Thanks!

bluish
  • 26,356
  • 27
  • 122
  • 180

3 Answers3

230

Assuming doc is your instance of org.w3c.dom.Document:

TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(doc), new StreamResult(writer));
String output = writer.getBuffer().toString().replaceAll("\n|\r", "");
WhiteFang34
  • 70,765
  • 18
  • 106
  • 111
  • 13
    the `replaceAll` is probably not necessary if you add another output property: `transformer.setOutputProperty(OutputKeys.INDENT, "no");` – bvdb Jun 01 '17 at 10:20
  • 11
    and the `writer.getBuffer().toString()` can just be `writer.toString()` – bvdb Jun 01 '17 at 10:23
  • @bvdb both excellent points. There is an extra advantage to explicitly adding the `OutputKeys.INDENT` line, because then you can also set it to `"yes"` to keep the whitespace--if that's what you want (in my situation I've found that just removing `replaceAll` did not work to include the whitespace in the string). – Jonathan Benn Oct 23 '18 at 12:46
  • See also https://stackoverflow.com/questions/1384802/java-how-to-indent-xml-generated-by-transformer for an explanation of how to get the indent to work properly – Jonathan Benn Apr 07 '20 at 14:46
2

Use the Apache XMLSerializer

here's an example: http://www.informit.com/articles/article.asp?p=31349&seqNum=3&rl=1

you can check this as well

http://www.netomatix.com/XmlFileToString.aspx

GuruKulki
  • 25,776
  • 50
  • 140
  • 201
  • Xerces is still, ridiculously, not officially distributing via Maven (thus groovy too), including no reliable source or JavaDocs, WTF! No official maven support makes deprecation resolution harder, makes consistent updates more hassle, and poses security risks, so it is stupid to have any dependencies on it now! – Infernoz Aug 15 '20 at 13:16
1

First you need to get rid of all newline characters in all your text nodes. Then you can use an identity transform to output your DOM tree. Look at the javadoc for TransformerFactory#newTransformer().

bluish
  • 26,356
  • 27
  • 122
  • 180
forty-two
  • 12,204
  • 2
  • 26
  • 36