Python/XML: Pretty-printing ElementTree

Question

I construct XML using The ElementTree XML API and I would like to be able to pretty-print

individual nodes (for inspection) as well as
the whole document (to a file, for future examination).

I can use use ET.write() to write my XML to file and then pretty-print it using many suggestions in Pretty printing XML in Python. However, this requires me to serialize and then deserialize the XML (to disk or to StringIO) just to serialize it again prettily - which is clearly suboptimal.

So, is there a way to pretty-print an xml.etree.ElementTree?

Pranav Gupta · Answer 1 · 2019-04-02T17:10:16.430

3

I was having issues using pretty print. Digging more into it I found the below solution which worked for me.

import xml.etree.cElementTree as etree
from xml.dom import minidom

root = etree.Element("root")
animal = etree.SubElement(root, "animal")
etree.SubElement(animal, "pet").text = "dog"

xmlstr = 
minidom.parseString(etree.toString(root)).toprettyxml(indent = "   ")
print (xmlstr)

Returns the result in XML format

edited Apr 02 '19 at 17:10

answered Apr 02 '19 at 00:18

Pranav Gupta

51
5

While this code may answer the question, providing additional context regarding _why_ and/or _how_ this code answers the question improves its long-term value. – lcnicolau Apr 02 '19 at 00:52

abarnert · Accepted Answer · 2018-04-23T21:58:32.220

2

As the docs say, in the write method:

file is a file name, or a file object opened for writing.

This includes a StringIO object. So:

outfile = cStringIO.StringIO()
tree.write(of)

Then you can just pretty-print outfile using your favorite method—just outfile.seek(0) then pass outfile itself to a function that takes a file, or pass outfile.getvalue() to a function that takes a string.

However, notice that many of the ways to pretty-print XML in the question you linked don't even need this. For example:

lxml.etree.tostring (answer #2): lxml.etree is a near-perfect superset of the stdlib etree, so if you're going to use it for pretty-printing, just use it to build the XML in the first place.
Effbot indent/prettyprint (answer #3): This expects an ElementTree tree, which is exactly what you already have, not a string or file.

edited Apr 23 '18 at 21:58

answered Apr 23 '18 at 21:52

abarnert

354,177
51
601
671

`StringIO` is serializing & deserializing - nfg. – sds Apr 23 '18 at 22:02
1

@sds Changing your question to make existing answers irrelevant really isn’t helpful. – abarnert Apr 23 '18 at 22:03
apologies - however, you gotta agree that the serializing & deserializing for pp is nfg ;-) – sds Apr 23 '18 at 22:05
More to the point, your new question makes no sense. If you want to use a function that takes a serialized XML document, obviously you have to serialize it. Alternatively, you can use a function that takes an ET tree—there are examples of that on the very question you linked, and in this answer. – abarnert Apr 23 '18 at 22:06
@sds I don’t know what “nfg” means, but if the cost of pretty-printing XML is acceptable, I assume the cost of serializing XML is acceptable, because pretty-printing inherently means serializing. – abarnert Apr 23 '18 at 22:09
@sds Then why did you even ask? I just linked to an answer on the same question you linked to in your question… – abarnert Apr 23 '18 at 22:09
I did not realize that lxml was a drop-in replacement for xml - basically, you can elide your answer to contain just that. ;-) – sds Apr 23 '18 at 22:11
The stringio method means: (1) serialize the XML to stringio (2) deserialize the string into dom (3) pretty-serialize it. IOW, to get a pretty serialization, I have to serialize *twice* and deserialize once. This, IMO, is "no freaking good". ;-) – sds Apr 23 '18 at 22:13
@sds Sure, but 90%+ of the time, the performance cost of printing your XML is a tiny fraction of the what matters, so tripling it is nothing. Because when that _isn’t_ true, there’s a very good chance the stdlib etree isn’t nearly fast enough in the first place. – abarnert Apr 23 '18 at 22:15
It's not really a matter of performance but rather of aesthetics. Doesn't doing the same work twice make you cringe? – sds Apr 23 '18 at 22:22
@sds Sure, but I usually take it as a sign that I’m using the wrong data structure. When you really do need to convert from one to another, you’ve already hit the code smell, so looking for a magic way to do it without iterating rather than backing up and asking why you have the wrong data structure is usually the wrong approach. – abarnert Apr 23 '18 at 22:25

Python/XML: Pretty-printing ElementTree

2 Answers2