Consider:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.newDocument();
Element root = doc.createElement("list");
doc.appendChild(root);
for(CorrectionEntry correction : dictionary){
Element elem = doc.createElement("elem");
elem.setAttribute("from", correction.getEscapedFrom());
elem.setAttribute("to", correction.getEscapedTo());
root.appendChild(elem);
}
(then follows the writing of the document into an XML file)
where getEscapedFrom
and getEscapedTo
return (in my code) something like finké
if the originating word is finké
. So as to perform a Unicode escape for the characters that are bigger than 127.
The problem is that the final XML has the following line <elem from="finke" to="fink&#xE9;" />
(from
is finke
, to
is finké
) where I would like it to be <elem from="finke" to="finké" />
I've tried, following another response in StackOverflow, to disable escaping of ampersands putting the line doc.appendChild(doc.createProcessingInstruction(StreamResult.PI_DISABLE_OUTPUT_ESCAPING, "&"));
after the creation of the doc
but without success.
How could I "tell XML" to not escape ampersands? Or, conversely, how could I let "XML" to convert from é
, or \\u00E9
, to é
?
Update
I managed to come to the problem: up until the writing of the file the node (through debug) seems to contain the right string. Once I call transformer.transform(domSource, streamResult);
everything goes wild.
DOMSource domSource = new DOMSource(doc);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
StreamResult streamResult = new StreamResult(baos);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(domSource, streamResult);
System.out.println(baos.toString());
The problem seems to be the transformer.