I have a String
contating binary 0
inside in UTF-8 ("A\u0000B"
). JAXB happily marshalls XML document containing such character but then fails to unmarshall it:
final JAXBContext jaxbContext = JAXBContext.newInstance(Root.class);
final Marshaller marshaller = jaxbContext.createMarshaller();
final Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
Root root = new Root();
root.value = "A\u0000B";
final ByteArrayOutputStream os = new ByteArrayOutputStream();
marshaller.marshal(root, os);
unmarshaller.unmarshal(new ByteArrayInputStream(os.toByteArray()));
The root class is just simple:
@XmlRootElement
class Root { @XmlValue String value; }
Output XML contains binary 0
as well between A
and B
(in hex: 41 00 42
) which causes the following error during unmarshalling:
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 63;
An invalid XML character (Unicode: 0x0) was found in the element content of the document.
Interestingly using raw DOM API (example) produces escaped 0
: A�B
but trying to read it back yields similar error. Also 0
(neither binary nor escaped) is not allowed by any XML parser or xmllint
(see also: Python + Expat: Error on � entities).
My questions:
why JAXB/DOM API allows creating invalid XML documents which it can not read back? Shouldn't it fail fast during marshalling?
is there some elegant and global solution? I saw people tackling this problem by:
But shouldn't mature XML stack in Java (I'm using 1.7.0_05) handle this either by default or by having some simple setting? I'm looking for escaping, ignoring or failing fast - but the default behavior of generating invalid XML is not acceptable. I believe such fundamental functionality should not require any extra coding on the client side.