3

Not sure if it's just me or the API but I am simply not able to create an XML file without having either an exception thrown at me or the thing I'm trying to set (DocType) not being set.

This is what I'm currently doing:

StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>");
stringBuilder.append("<!DOCTYPE document>");

String xmlString = AnnotatedDocumentTree.toString(annotatedDocumentTree, new SimpleAnnotatedDocumentTreeXmlConverter(), stringBuilder);

DocumentBuilderFactory icFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder icBuilder;          
Document finalDocument = null;                 

StringWriter writer = new StringWriter();

try {

    icBuilder = icFactory.newDocumentBuilder(); 

    finalDocument = icBuilder.parse(new InputSource(new ByteArrayInputStream(xmlString.getBytes("UTF-8"))));                

    Transformer transformer = TransformerFactory.newInstance().newTransformer();

    DocumentType doctype = xmlDocument.getDoctype();                    

    transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, doctype.getSystemId());
    transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, doctype.getPublicId());
    transformer.transform(new DOMSource(finalDocument), new StreamResult(writer));

    finalDocument = icBuilder.parse(new InputSource(new ByteArrayInputStream(writer.toString().getBytes("UTF-8"))));


} catch (Exception e) {
    e.printStackTrace();
}

However, this way I'm getting an exception. I can use the DocumentBuilderFactory and configure it like this:

icFactory.setValidating(false);
icFactory.setNamespaceAware(true);
icFactory.setFeature("http://xml.org/sax/features/namespaces", false);
icFactory.setFeature("http://xml.org/sax/features/validation", false);
icFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
icFactory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

but then DocType of my finalDocument will be null.

Setting my own EntityResolver won't do the trick either:

builder.setEntityResolver(new EntityResolver() {
    @Override
    public InputSource resolveEntity(String publicId, String systemId)
            throws SAXException, IOException {
        if (systemId.contains(".dtd")) {
            return new InputSource(new StringReader(""));
        } else {
            return null;
        }
    }
});

because if I want to set doctype.getSystemId() I really want to set doctype.getSystemId().

Is there a way to shove set it without this headache?


Essentially I want to parse this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE document>
<ds>
    ABGB <cue>: §§ 786 , 810 , 812 </cue>Die Kosten der ... 
    <cue>von</cue>
    <Relation bewertung="1">7 Ob 56/10a </Relation>= 
    <Relation bewertung="1">Zak 2010/773 , 440 </Relation>. 
</ds>

and transform it into this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ds PUBLIC "-//MBO//DTD artikel-at 1.0//DE" "http://dtd.company.de/dtd-at/artikel.dtd">
<ds>
    ABGB <cue>: §§ 786 , 810 , 812 
    </cue>Die Kosten der ... <cue>
    von 
    </cue><Relation bewertung="1">7 Ob 56/10a </Relation>= 
    <Relation bewertung="1">Zak 2010/773 , 440 </Relation>. 
</ds>
Community
  • 1
  • 1
Stefan Falk
  • 23,898
  • 50
  • 191
  • 378

2 Answers2

2

To me your code works if the dtd exists at the specified location (systemId), otherwise adding the entity resolver as in the code down makes the trick.

I don't have xmlDocument so I hardcoded the values

    StringBuilder stringBuilder = new StringBuilder();
    stringBuilder.append("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>");
    stringBuilder.append("<!DOCTYPE document><document/>");

    String xmlString = stringBuilder.toString(); // AnnotatedDocumentTree.toString(annotatedDocumentTree, new SimpleAnnotatedDocumentTreeXmlConverter(), stringBuilder);

    DocumentBuilderFactory icFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder icBuilder;          
    Document finalDocument = null;                 

    StringWriter writer = new StringWriter();

    try {

        icBuilder = icFactory.newDocumentBuilder(); 

        finalDocument = icBuilder.parse(new InputSource(new ByteArrayInputStream(xmlString.getBytes("UTF-8"))));                

        Transformer transformer = TransformerFactory.newInstance().newTransformer();

        //DocumentType doctype = xmlDocument.getDoctype();                    

        transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "xdtd.dtd"); // doctype.getSystemId());
        transformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "xxxx"); //doctype.getPublicId());
        transformer.transform(new DOMSource(finalDocument), new StreamResult(writer));

        icBuilder.setEntityResolver(new EntityResolver() {
            @Override
            public InputSource resolveEntity(String publicId, String systemId)
                    throws SAXException, IOException {
                if (systemId.contains(".dtd")) {
                    return new InputSource(new StringReader(""));
                } else {
                    return null;
                }
            }
        });
        finalDocument = icBuilder.parse(new InputSource(new ByteArrayInputStream(writer.toString().getBytes("UTF-8"))));

        System.out.println(finalDocument.getDoctype().getPublicId());
        System.out.println("-----------");
        System.out.println(writer.toString());

    } catch (Exception e) {
        e.printStackTrace();
    }

Output:

      xxxx
     -----------


     <?xml version="1.0" encoding="UTF-8"?>
     <!DOCTYPE document PUBLIC "xxxx" "xdtd.dtd">
     <document/>

Also the option of setting the properties works, without entity resolver, must be done before creating the builder. Of the properties, only http://apache.org/xml/features/nonvalidating/load-external-dtd is needed.


Here is the fun thing though: It's getting set on-read as it appears:

Before accessing docType:

enter image description here

After accessing docType:

enter image description here


This can be controlled, in Xerces, using property http://apache.org/xml/features/dom/defer-node-expansion, by default true

Testo Testini
  • 2,200
  • 18
  • 29
  • Sorry, but the point is to make it work even it this `dtd` does not exist. I just want to set it without checking. – Stefan Falk Dec 19 '16 at 15:11
  • is working without dtd `xdtd.dtd` existing, try to run the code – Testo Testini Dec 19 '16 at 15:17
  • Yes it would work without that. But I *need* to set it. ^^ – Stefan Falk Dec 19 '16 at 15:20
  • It is set. Output has doctype, doctype systemId is set to unexisting file `xdtd.dtd` and publicId to `xxxx`, what's wrong with that ? Code is setting `xdtd.dtd` with `transformer.setOutputProperty(..` – Testo Testini Dec 19 '16 at 15:26
  • If you set a breakpoint after `finalDocument = isBuilder.parse(...);` you'll see that the member `docType` of `finalDocument` is `null`. This is a problem because I am serializing this document in a later stage. Since it is `null`, the `docType` is not defined after serialization. – Stefan Falk Dec 19 '16 at 15:33
  • To me is there. I have updated the example to print the doctype publicId, do you get null pointer ? – Testo Testini Dec 19 '16 at 15:55
  • Oh - my - god ... It's getting set on access. I'll add this to your answer if I may. Your example works but I always stopped debugging as I saw it's `null`.. – Stefan Falk Dec 19 '16 at 16:01
  • ;-) I think this happens for xerces by default deferring node expansion, see updated answer – Testo Testini Dec 19 '16 at 16:16
2

Try this:

Transformer t = TransformerFactory.newInstance().newTransformer();
Source s = new StreamSource(new StringReader(inputXML));
StringWriter sw = new StringWriter();
t.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "my.system.id");
t.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "my/public/id");
t.transform(s, new StreamResult(sw));

No need for this to go via DOM at all.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164