14

how come dom with java erases doctype when editing xml ?

got this xml file :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE map[ <!ELEMENT map (station*) >
                <!ATTLIST station  id   ID    #REQUIRED> ]>
<favoris>
<station id="5">test1</station>
<station id="6">test1</station>
<station id="8">test1</station>
</favoris> 

my function is very basic :

public static void EditStationName(int id, InputStream is, String path, String name) throws ParserConfigurationException, SAXException, IOException, TransformerFactoryConfigurationError, TransformerException{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    DocumentBuilder builder = factory.newDocumentBuilder();
    Document dom = builder.parse(is);

    Element e = dom. getElementById(String.valueOf(id));
    e.setTextContent(name);
    // Write the DOM document to the file
    Transformer xformer = TransformerFactory.newInstance().newTransformer();
    FileOutputStream fos = new FileOutputStream(path);
    Result result = new StreamResult(fos);  
    Source source = new DOMSource(dom);


        xformer.setOutputProperty(
                OutputKeys.STANDALONE,"yes"     
                );

    xformer.transform(source, result);
}

it's working but the doctype gets erased ! and I just got the whole document but without the doctype part, which is important for me because it allows me to retrieve by id ! how can we keep the doctype ? why does it erase it? I tried many solution with outputkeys for example or omImpl.createDocumentType but none of these worked...

thank you !

user2864740
  • 60,010
  • 15
  • 145
  • 220
KitAndKat
  • 953
  • 3
  • 14
  • 29
  • I surprised you get anything; your XML is invalid. – Daniel Haley Jul 09 '11 at 19:50
  • Two things: 1) Your doctype (map) doesn't match your root element (favoris). 2) The element "station" isn't declared. You should add an element declaration for station and then change "favoris" to "map" (or change the doctype and element declaration). – Daniel Haley Jul 09 '11 at 20:20
  • i'm sorry, could you just write it here ? because I'm a complete stranger to doc type things... :=) – KitAndKat Jul 09 '11 at 21:46
  • maybe something like that ? <!ELEMENT station (#PCDATA)> <!ATTLIST station id ID #REQUIRED> ]> – KitAndKat Jul 09 '11 at 21:56

4 Answers4

11

Your input XML is not valid. That should be:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris [
    <!ELEMENT favoris (station)+>
    <!ELEMENT station (#PCDATA)>
    <!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">test1</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

As @DevNull wrote to be fully valid you can't write <station id="5">test1</station> (however for Java it works anyway even with that issue).


DOCTYPE is erased in output XML document:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

I didn't find solution to missing DTD yet, but as workaround you can set external DTD:

xformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "favoris.dtd");

Result (example) document:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris SYSTEM "favoris.dtd">
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

EDIT:

I don't think it's possible to save inline DTD using Transformer class (vide here). If you can't use external DTD reference, then you can DOM Level 3 LSSerializer class instead:

DOMImplementationLS domImplementationLS =
    (DOMImplementationLS) dom.getImplementation().getFeature("LS","3.0");
LSOutput lsOutput = domImplementationLS.createLSOutput();
FileOutputStream outputStream = new FileOutputStream("output.xml");
lsOutput.setByteStream((OutputStream) outputStream);
LSSerializer lsSerializer = domImplementationLS.createLSSerializer();
lsSerializer.write(dom, lsOutput);
outputStream.close();

Output with wanted DTD (I can't see any option to add standalone="yes" using LSSerializer...):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE favoris [<!ELEMENT favoris (station)+>
<!ELEMENT station (#PCDATA)>
<!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris> 

Another approach is to use Apache Xerces2-J XMLSerializer class:

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
...

XMLSerializer serializer = new XMLSerializer();
serializer.setOutputCharStream(new java.io.FileWriter("output.xml"));
OutputFormat format = new OutputFormat();
format.setStandalone(true);
serializer.setOutputFormat(format);
serializer.serialize(dom);

Result:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris [<!ELEMENT favoris (station)+>
<!ELEMENT station (#PCDATA)>
<!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>
Grzegorz Szpetkowski
  • 36,988
  • 6
  • 90
  • 137
8

(This response is in a way only a supplement to @Grzegorz Szpetkowski's answer, why it works)

You lose the doctype definition because you use the Transform class which produces an XSL transformation. There is no DOCTYPE declaration or docytype definition object/node in XSLT tree model. When a parser hands over the document to an XSLT processor, the doctype info is lost and therefore cannot be retained or duplicated. XSLT offers some control over the serialization of the output tree, including adding an <!DOCTYPE ... > declaration with a public or system identifier. The values for these identifiers need to be known beforehand and cannot be read from the input tree. Creating or retaining an embedded DTD or entity declarations is also not supported (although one workaround for this obstacle is to output it as text with disable-output-escaping="yes").

In order to preserve the DTD you need to output your document with an XML serializer instead of XSL transformation, like Grzegorz already suggested.

jasso
  • 13,736
  • 2
  • 36
  • 50
  • thank you for this precise explanation ! this is very clear now why it's not possible... as this was an android application, I couldn't really use all those wordarounds. So... I manually turned my dom to a string, appending at first the doctype to my stringbuilder ! :/ thank you ! – KitAndKat Jul 10 '11 at 00:57
2

@Grzegorz Szpetkowski has a good idea with using an external DTD. However, the XML is still invalid if you keep those station/@id values.

Any attribute with the type "ID" can't have a value that starts with a digit. You'll have to add something to it, like "s" for station:

<!DOCTYPE favoris [
<!ELEMENT favoris (station*)      > 
<!ELEMENT station (#PCDATA)       > 
<!ATTLIST station 
          id       ID   #REQUIRED > 
]>
<favoris>
  <station id="s5">test1</station>
  <station id="s6">test1</station>
  <station id="s8">test1</station>
</favoris>
Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
  • You're greatly right and I forgot about that rule :) However even with that issue output XML document has inline DTD using `LSSerializer` class instead of `Transformer` approach. – Grzegorz Szpetkowski Jul 09 '11 at 23:10
0

I had almost the same problem and found this which works with transform. It is limited since it only allows to reference the dtd and it will require some work if the doctype of the document can vary. It was enough in my case though, I only needed to hardcode the xhtml doctype after a transformation.

xformer.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC, "publicId");
xformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "systemId");
mmarinero
  • 365
  • 1
  • 4