I'm having the same issue and as far as I could debug, everything indicates that there's a bug in the JDK (at least on build 1.8.0_162-b12), more specifically in the class com.sun.org.apache.xalan.internal.xsltc.trax.StAXEvent2SAX
.
The NPE is actually only a consequence of another bug, which is related to how the reader is handled in this class's bridge()
method. There if the reader in not in the START_DOCUMENT
state, the next event is only peeked but not advanced with nextEvent()
on the very first time. This leads to the first START_ELEMENT
event to be processed twice. This can be well observed if you use a StreamResult
instead of DOMResult
. There the NPE does not occur, but the XML produced in the result stream will contain the start of the tag of the first element twice.
I'm trying now to workaround this with an XmlEventWriter
that receives the DOMResult
. So, basically simulating what the Transformer
would do, pushing each read event directly to that writer. If I succeed, I'll post my solution here as well.
PS: I would like to report this issue on the JDK or eventually even push a potential solution to it. If anybody could tell me how this is supposed to be done, I would very much appreciate it.
UPDATE:
So, I managed to workaround this issue with the approach mentioned above. Based on the code suggested in Reading a big XML file using stax and dom, instead of using the Transformer
, you could use the following method:
private Node readToNode(final XMLEventReader reader) throws XMLStreamException, ParserConfigurationException {
XMLEvent event = reader.peek();
if (!event.isStartElement()) {
throw new IllegalArgumentException("reader must be on START_ELEMENT event");
}
final Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
final XMLEventWriter writer = XMLOutputFactory.newInstance().createXMLEventWriter(new DOMResult(document));
int depth = 0;
do {
event = reader.nextEvent();
writer.add(event);
if (event.isStartElement()) {
depth++;
} else if (event.isEndElement()) {
depth--;
}
} while (reader.hasNext() && !(event.isEndElement() && depth <= 0));
return document.getDocumentElement();
}
However, this approach has some limitations! As visible in the code, we need to create a Document
object that wraps the node, otherwise the XML writer will run into issues. If you are intending to manipulate this DOM and send it afterwards to another active XMLEventWriter
(as I was trying to do) using the Transformer
again, it will fail. This is because the Transformer
will send a START_DOCUMENT event to the writer that had already started. I tried the same approach the other way round, i. e. wrapping the node into a DOMSource
, send it to another XmlEventReader
and pipe the events to my existing XmlEventWriter
, but that also doesn't work as XmlEventReader
apparently supports only StreamSource
s (see here).
Summarizing, if you only need the DOM objects, this could work well but if you're trying to transform XML fragments piping the events to a writer (as I do), you could run into issues.