3

I have an in-memory python XML ElementTree which looks like

<A>
  <B>..</B>
  <C>..</C>
  <D>..</D>
</A>

I serialize the ElementTree into xml by

xmlstr = minidom.parseString(ET.tostring(root)).toprettyxml("  ")

The order of the inner nodes B,C,D changes every time i invoke the above tostring() method. How can i make sure my serialization will follow a deterministic order?

user1159517
  • 5,390
  • 8
  • 30
  • 47

1 Answers1

2

I realize many answers here suggest this, but

minidom.parseString(ET.tostring(root)).toprettyxml("  ")

is actually a really horrible way of pretty-printing an XML file.

It involves parsing, serializing with ET and then parsing again and serializing again with a completely different XML library. It's silly and wasteful and I would not be surprised if minidom messes it up.

If you have it installed, switch to lxml and use its built-in pretty-printing function.

If you are for some reason stuck with xml.etree.ElementTree, you can use a simple recursive function to prettify a tree in-place:

# xmlhelpers.py

# taken from http://effbot.org/zone/element-lib.htm#prettyprint
def indent(elem, level=0):
    i = "\n" + level*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for elem in elem:
            indent(elem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i

Usage is straight-forward:

import xml.etree.ElementTree as ET
from xmlhelpers import indent    

root = ET.fromstring("<A><B>..</B><C>..</C><D>..</D></A>")
indent(root)

print( ET.tostring(root) )

This prints a nicely indented version:

b'<A>\n  <B>..</B>\n  <C>..</C>\n  <D>..</D>\n</A>\n'

That being said, never use "tostring" to write an XML tree to a file.

Always write XML files with the functions provided by the XML library.

tree = ET.ElementTree(root) # only necessary if you don't already have a tree
tree.write(filename, encoding="UTF-8")
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • "That being said, never use "tostring" to write an XML tree to a file." Can you explain? – Milo Dec 30 '19 at 16:08
  • 1
    Sure. Here goes: It helps to think of XML not as text. XML is the representation of a tree structure, complex data - alas, in textual form. But with the textual form comes the problem of text encodings. XML goes out of its way to make sure that text encodings are not an issue, via the XML declaration (e.g. ``) where the encoding of the file is declared by the serializer as it writes the file, so the parser knows what to do when it reads the file, so all data is retained. – Tomalak Dec 30 '19 at 18:10
  • 1
    Now if you convert the data structure to string, it would be *your* task to declare the file encoding properly when you write the string to file, and add the appropriate XML declaration yourself. Most of the time this will not happen, because people tend not to think about this at all. They write a string to file, boom, done. This means data will break, because the default file encoding of the current programming language does not necessarily match the default file encoding of an XML parser.. – Tomalak Dec 30 '19 at 18:17
  • 1
    In other words, convert-to-string-then-save-to-file circumvents a really nice implementation of automatic text file encoding handling, which is a silly thing to do. And it usually is more lines of code than `tree.write(filename)` so it's more work on top of being silly. – Tomalak Dec 30 '19 at 18:18