4

I have XML starts with following:

<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE       ispXCF  SYSTEM  "IspXCF.dtd" >
<ispXCF version="3.7.0">
    <Comment></Comment>
    <Chain>
        <Comm>JTAG</Comm>
        <Device>
        ....

And I am using xml.etree.ElementTree parser. But this parser deletes the second line starting with <!DOCTYPE

I am using the following arguments in the write method:

tree.write("data.xml", encoding="utf-8", xml_declaration=True, short_empty_elements=None)

but it only keeps <?xml version='1.0' encoding='utf-8'?>

Is there some way how to not delete <!DOCTYPE line? Or will I have to use some other XML parser?

Erik Šťastný
  • 1,487
  • 1
  • 15
  • 41

1 Answers1

2

It seems xml.etree.ElementTree has very poor support for doctype declarations.

You can create them for new XML with TreeBuilder objects, but not read them from Element or ElementTree objects created from existing XML.

As answered here, you'd have to manually copy paste the doctype declaration...pretty fugly IMHO.

So, in all, it would seem best to switch to lxml.

Community
  • 1
  • 1
Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96
  • Lxml seems to be fine but i want to keep using of standart python libraries only. – Erik Šťastný Feb 08 '17 at 11:28
  • I understand, but what you want is simply not possible with `xml.etree.ElementTree` without the ugly hack mentioned...That seems to be something you'll have to throw into your trade-off: (1) use non-standard `lxml` and be faster, prettier, more versatile, etc., or (2) use `xml.etree.ElementTree` and be more portable. The choice is yours. – Rody Oldenhuis Feb 08 '17 at 11:33
  • I will have to use mentioned "hack" because my app has to work on standard python newbie installation :) – Erik Šťastný Feb 08 '17 at 11:44
  • I don't think it is that ugly an hack - after all, all human interation with doct-typed xml's will involve "copy-pasting" the doctype anyways. The fact it is not automatic rendered from an internal data-structure for that seens quite lesser. – jsbueno Feb 08 '17 at 12:59
  • @jsbueno the fact that you have no way to copy the original doctype just makes it much less versatile. Sure, you can parameterize the new, desired doctype, that sure beats hard coding it (as is done in the answer I linked to). But to what should that doctype default? *The original*, obviously. There's no way to do **the obvious** without [resorting to regular expressions](http://stackoverflow.com/a/1732454/1085062), and that's what makes this so ugly. But beauty is in the eye of the beholder, of course. – Rody Oldenhuis Feb 08 '17 at 13:07