15

reading the documentation for java org.w3c.dom.ls it seems as a Element only can be serialized to a String with the java native string encoding, UTF-16. I need however to create a UTF-8 string, escaped or what not, I understand that it still will be a UTF-16 String. Anyone has an idea to get around this? I need the string to pass in to a generated WS client that will consume the String, then it should be UTF-8.

the code i use to create the string:

DOMImplementationRegistry domImplementationRegistry = DOMImplementationRegistry.
DOMImplementationLS domImplementationLS = (DOMImplementationLS) REGISTRY.getDOMImplementation("LS");
LSSerializer writer = domImplementationLS.createLSSerializer();
String result = writer.writeToString(element);
Tomas
  • 1,725
  • 2
  • 18
  • 27
  • 2
    @Tomas - there is no such thing as a UTF-8 Java String. I would expect any attempt to coerce UTF-8 encoded bytes into a char type to end in tears. – McDowell Oct 28 '09 at 12:56

2 Answers2

18

You can still use DOMImplementationLS:

DOMImplementationRegistry domImplementationRegistry = DOMImplementationRegistry.
DOMImplementationLS domImplementationLS = (DOMImplementationLS)REGISTRY.getDOMImplementation("LS");
LSOutput lsOutput =  domImplementationLS.createLSOutput();
lsOutput.setEncoding("UTF-8");
Writer stringWriter = new StringWriter();
lsOutput.setCharacterStream(stringWriter);
lsSerializer.write(doc, lsOutput);     
String result = stringWriter.toString();
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Jeryl Cook
  • 989
  • 17
  • 40
9

I find that the most flexible way of serializing a DOM to String is to use the javax.xml.transform API:

    Node node = ...
    StringWriter output = new StringWriter();

    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    transformer.transform(new DOMSource(node), new StreamResult(output));

    String xml = output.toString();

It's not especially elegant, but it should give you better control over output encoding.

skaffman
  • 398,947
  • 96
  • 818
  • 769
  • works as a charm, but how do I set the encoding explicit, this generates UTF-8 with no configuration? – Tomas Oct 28 '09 at 12:25
  • That's up to the `Writer` implementation that you use. `StringWriter` just happens to default to UTF-8, I think. – skaffman Oct 28 '09 at 12:28
  • 1
    @skaffman - "StringWriter just happens to default to UTF-8". You are mistaken. The String is UTF-16; the transformer might add an XML header that says ``, but that has nothing to do with any actual encoding operations. – McDowell Oct 28 '09 at 12:58
  • 1
    Worked for me as well - the other one had that UTF-16 stuff which caused "content not allowed in prolog" error while trying to parse with a document builder. – Nicholas DiPiazza Apr 10 '13 at 16:07