4

I am trying to read an XML document and output it into a new XML document using the W3C DOM API in Java. To handle DOCTYPEs, I am using the following code (from an input Document doc to a target File target):

TransformerFactory transfac = TransformerFactory.newInstance();
Transformer trans = transfac.newTransformer();
trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no"); // omit '<?xml version="1.0"?>'
trans.setOutputProperty(OutputKeys.INDENT, "yes");

// if a doctype was set, it needs to persist
if (doc.getDoctype() != null) {
    DocumentType doctype = doc.getDoctype();
    trans.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, doctype.getSystemId());
    trans.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, doctype.getPublicId());
}

FileWriter sw = new FileWriter(target);
StreamResult result = new StreamResult(sw);
DOMSource source = new DOMSource(doc);
trans.transform(source, result);

This works fine for both XML documents with and without DOCTYPEs. However, I am now coming across a NullPointerException when trying to transform the following input XML document:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE permissions >
<permissions>
  // ...
</permissions>

HTML 5 uses a similar syntax for its DOCTYPEs, and it is valid. But I have no idea how to handle this using the W3C DOM API - trying to set the DOCTYPE_SYSTEM to null throws an exception. Can I still use the W3C DOM API to output an empty doctype?

jevon
  • 3,197
  • 3
  • 32
  • 40

2 Answers2

5

Although this question is two years old, it is a top search result in some web search engine, so maybe it is a useful shortcut. See the question Set HTML5 doctype with XSLT referring to http://www.w3.org/html/wg/drafts/html/master/syntax.html#doctype-legacy-string, which says:

For the purposes of HTML generators that cannot output HTML markup with the short DOCTYPE "<!DOCTYPE html>", a DOCTYPE legacy string may be inserted into the DOCTYPE [...]

In other words, <!DOCTYPE html SYSTEM "about:legacy-compat"> or <!DOCTYPE html SYSTEM 'about:legacy-compat'>, case-insensitively except for the part in single or double quotes.

Leading to a line of Java code like this:

trans.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "about:legacy-compat");
Community
  • 1
  • 1
Johannes
  • 150
  • 3
  • 13
1

Try the suggestions here https://stackoverflow.com/a/6637886/116509. Basically it looks like it can't be done with standard Java DOM support.

You can also try StAX

    XMLStreamWriter xmlStreamWriter =
        XMLOutputFactory.newFactory().createXMLStreamWriter( System.out, doc.getXmlEncoding() );
    Result result = new StAXResult( xmlStreamWriter );
    // ... create dtd String 
    xmlStreamWriter.writeDTD( dtd );
    DOMSource source = new DOMSource( doc );
    trans.transform( source, result );

but it's ugly because the DTD parameter is a String, and you only have a DocumentType object.

Community
  • 1
  • 1
artbristol
  • 32,010
  • 5
  • 70
  • 103
  • I was also thinking of hacking around the FileWriter to write the XML prolog and the empty doctype, followed by the XML document `transform` without the XML prolog. – jevon May 27 '12 at 22:01