Producing UTF-8 encoded XML in Java

Question

This is the code I'm using

try {
String str = "\uC3BC and \uC3B6 and <&> für";

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.newDocument();
Element root = doc.createElement("test");
root.setAttribute("attribute", str);
doc.appendChild(root);

DOMSource domSource = new DOMSource(doc);
// FileOutputStream out = new FileOutputStream("test.xml");
Writer out = new OutputStreamWriter(new FileOutputStream("test.xml"), "UTF8");

Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(domSource, new StreamResult(out));

out.close();
} catch (Exception e) {
e.printStackTrace();
}

Output is

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<test attribute="쎼 and 쎶 and &lt;&amp;&gt; für"/>

I want it to output

attribute="&#xc3bc and &#xc3b6 ..."

How do I achieve this ?

I'm using Java 1.6-20

This is similar to Producing valid XML with Java and UTF-8 encoding

**Why** do you want character references instead of the characters themselves? Since you use UTF-8, you don't need to (and it carries the exact same information anyway). — Joachim Sauer, Sep 30 '11 at 07:08
My apology, I didn't state my question clearly. I wanted escaping. — bouncyrabbit, Sep 30 '11 at 07:41
@bouncyrabbit: I got that, but **why** do you want escaping? Both forms are exactly equivalent, it should not make a difference. — Joachim Sauer, Sep 30 '11 at 09:42

score 6 · Accepted Answer · answered Sep 30 '11 at 07:28

6

If you don't want the XML to be encoded as UTF-8, you shouldn't tell the transformer to do so.

If I understand your question correctly

transformer.setOutputProperty(OutputKeys.ENCODING, "US-ASCII");

should produce the output that you want

answered Sep 30 '11 at 07:28

mth

677
1
7
19

Producing UTF-8 encoded XML in Java

1 Answers1