57

I have created a xml file using xml.etree.ElementTree in python. I then use

tree.write(filename, "UTF-8") 

to write out the document to a file.

But when I open filename using a text editor, there are no newlines between the tags. Everything is one big line

How can I write out the document in a "pretty printed" format so that there are new lines (and hopefully indentations etc) between all the xml tags?

TooTone
  • 7,129
  • 5
  • 34
  • 60
MK.
  • 3,907
  • 5
  • 34
  • 46

6 Answers6

90

UPDATE 2022 - python 3.9 and later versions

For python 3.9 and later versions the standard library includes xml.etree.ElementTree.indent:

Example:

import xml.etree.ElementTree as ET

root = ET.fromstring("<fruits><fruit>banana</fruit><fruit>apple</fruit></fruits>""")
tree = ET.ElementTree(root)
    
ET.indent(tree, '  ')
# writing xml
tree.write("example.xml", encoding="utf-8", xml_declaration=True)

Thanks Michał Krzywański for this update!

BEFORE python 3.9

I found a new way to avoid new libraries and reparsing the xml. You just need to pass your root element to this function (see below explanation):

def indent(elem, level=0):
    i = "\n" + level*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for elem in elem:
            indent(elem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i

There is an attribute named "tail" on xml.etree.ElementTree.Element instances. This attribute can set an string after a node:

"<a>text</a>tail"

I found a link from 2004 telling about an Element Library Functions that uses this "tail" to indent an element.

Example:

root = ET.fromstring("<fruits><fruit>banana</fruit><fruit>apple</fruit></fruits>""")
tree = ET.ElementTree(root)
    
indent(root)
# writing xml
tree.write("example.xml", encoding="utf-8", xml_declaration=True)

Result on "example.xml":

<?xml version='1.0' encoding='utf-8'?>
<fruits>
    <fruit>banana</fruit>
    <fruit>apple</fruit>
</fruits>
mzjn
  • 48,958
  • 13
  • 128
  • 248
Erick M. Sprengel
  • 1,921
  • 1
  • 17
  • 20
  • 9
    He passed you up on a great solution - if it's any consolation, I'm using your code and it works well! – Dagrooms Dec 13 '16 at 22:38
  • 2
    I like your solution as well. I just had to change the first line in your function to not use "/n" but to use `os.linesep` for it to get a new line properly in Notepad (Windows). – G Trawo Aug 20 '18 at 14:31
  • Great solution indeed! In fact when you append new SubElements to a node in `lxml`, the new elements don't have tail whitespace set. So your solution is necessary in that situation even with the `lxml` package. – Anton Jan 21 '19 at 09:53
  • Amazing solution! Wish it was built into the library. – youngrrrr Jul 23 '20 at 00:20
  • This is great. Exactly what I needed. Thank you – floyd1510 Nov 27 '20 at 02:28
  • I agree with @youngrrrr that this should be built into the library. – Aaron John Sabu Aug 02 '21 at 15:49
  • 1
    This `indent` functionality is already part of [python standard](https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.indent) since 3.9 – Michał Krzywański May 06 '22 at 11:22
  • @mzjn it was my mistake, I fixed the subtitles. thks – Erick M. Sprengel May 09 '22 at 06:05
  • After previously trying both lxml and minidom methods, I finally got what I needed by simply altering the tail. Thanks! – otocan May 27 '22 at 14:49
  • I have to use IronPython 2.7 and importing external modules is complicated. You solution worked great! Thank you! – Pavel Urubcik Jul 22 '23 at 07:02
34

The easiest solution I think is switching to the lxml library. In most circumstances you can just change your import from import xml.etree.ElementTree as etree to from lxml import etree or similar.

You can then use the pretty_print option when serializing:

tree.write(filename, pretty_print=True)

(also available on etree.tostring)

Honest Abe
  • 8,430
  • 4
  • 49
  • 64
Steven
  • 28,002
  • 5
  • 61
  • 51
  • Thanks Steven. This is what I ended up doing. – MK. Jun 24 '10 at 15:40
  • But this is not working for newly created elements that were added to the tree. They still look clumsy – Ridhuvarshan Jun 07 '18 at 18:05
  • 2
    https://stackoverflow.com/questions/7903759/pretty-print-in-lxml-is-failing-when-i-add-tags-to-a-parsed-tree/7904066#7904066 Found the answer here – Ridhuvarshan Jun 08 '18 at 03:00
  • What does the `pretty_print` option do? The [documentation](https://lxml.de/3.6/api/lxml.etree-module.html#tostring) says that it "enables formatted XML," but what does it mean that the XML is formatted? – HelloGoodbye Jun 25 '18 at 09:25
14

There is no pretty printing support in ElementTree, but you can utilize other XML modules.

For example, xml.dom.minidom.Node.toprettyxml():

Node.toprettyxml([indent=""[, newl=""[, encoding=""]]])

Return a pretty-printed version of the document. indent specifies the indentation string and defaults to a tabulator; newl specifies the string emitted at the end of each line and defaults to \n.

Use indent and newl to fit your requirements.

An example, using the default formatting characters:

>>> from xml.dom import minidom
>>> from xml.etree import ElementTree
>>> tree1=ElementTree.XML('<tips><tip>1</tip><tip>2</tip></tips>')
>>> ElementTree.tostring(tree1)
'<tips><tip>1</tip><tip>2</tip></tips>'
>>> print minidom.parseString(ElementTree.tostring(tree1)).toprettyxml()
<?xml version="1.0" ?>
<tips>
    <tip>
        1
    </tip>
    <tip>
        2
    </tip>
</tips>

>>> 
Community
  • 1
  • 1
gimel
  • 83,368
  • 10
  • 76
  • 104
  • 4
    Good answer, but the only question is: why does minidom insert extraenous whitespace (for `1` and `2` ; significant in xml)? – ChristopheD Jun 22 '10 at 17:56
  • Good question ;-) Use with care. – gimel Jun 22 '10 at 18:10
  • Thnaks for the answear! It almost woked for me. The only problem is that it is deleting the `encoding="utf-8"` from the header when I do `ET.tostring(main, encoding='utf8', method='xml').decode()`. solved with `toprettyxml(encoding='utf8')` – Charalamm Sep 17 '20 at 13:56
4

Without the use of external libraries, you can easily achieve a newline between each XML tag in the output by setting the tail attribute for each element to '\n'.

You can also specify the number of tabs after the newline here. However, in the OP's use-case tabs may be easier to achieve with an external library, or see Erick M. Sprengel's answer.

I ran into the same problem while trying to modify an xml document using xml.etree.ElementTree in python. In my case, I was parsing the xml file, clearing certain elements (using Element.clear()), and then writing the result back to a file.

For each element that I had cleared, there was no new line after its tag in the output file.

ElementTree's Element.clear() documentation states:

This function removes all subelements, clears all attributes, and sets the text and tail attributes to None.

This made me realize that the text and tail attributes of an element were how the output format was being determined. In my case, I was able to just set these attributes of the cleared element to the same values as before clearing it. This tail value ended up being '\n\t' for first-level children of the root xml element, with the number of tabs indicating the number of tabs displayed in the output.

TooTone
  • 7,129
  • 5
  • 34
  • 60
hackintosh
  • 41
  • 2
2

I expanded the indent function of @Erick M. Sprengel:

  • The original "indent" function was renamed to "format_xml",
  • docstrings were added
  • the function was adjusted by adding the "lag_indent" and "lag_nl" parameters and their corresponding logic for controlling how many XML child levels indentation and new lines are not added.

Thanks for your contribution!

# The basis for "format_xml" function was "indent" function in the answer of
# Erick M. Sprengel in the following link: https://stackoverflow.com/questions/3095434/inserting-newlines-in-xml-file-generated-via-xml-etree-elementtree-in-python
# The original license: https://creativecommons.org/licenses/by-sa/3.0/
def format_xml(self, elem, level=0, lag_indent=3, lag_nl=1):
    """Adds indents and new lines to XML for better readability.

    Args:
        elem (xml.etree.ElementTree.Element): An Element instance.
        level (int): The current level of XML. When calling this method from
            the other parts of the code (other than this method), level
            should be 0.
        lag_indent (int): Indicates for how many XML child levels
            indentation will not be applied.
        lag_nl (int): Indicates for how many XML child levels a new line
            will not be added.
    """
    def tail_adjustment(el, lag, indent):
        if lag > 0:
            el.tail = indent
        else:
            el.tail = "\n" + indent

    def text_adjustment(el, lag, indent):
        if lag_indent > 0:
            if lag > 0:
                el.text = indent
            else:
                el.text = "\n" + indent
        else:
            if lag > 0:
                el.text = indent + "  "
            else:
                el.text = "\n" + indent + "  "

    i = level*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            text_adjustment(elem, lag_nl, i)
        if not elem.tail or not elem.tail.strip():
            tail_adjustment(elem, lag_nl, i)
        for elem in elem:
            if lag_indent > 0:
                self.format_xml(elem, 0, lag_indent-1, lag_nl-1)
            else:
                self.format_xml(elem, level+1, lag_indent-1, lag_nl-1)
        if not elem.tail or not elem.tail.strip():
            tail_adjustment(elem, lag_nl, i)
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            tail_adjustment(elem, lag_nl, i)
Adri
  • 21
  • 1
0

According to this thread your best bet would be installing pyXml and use that to prettyprint the ElementTree xml content (as ElementTree doesn't seem to have a prettyprinter by default in Python):

import xml.etree.ElementTree as ET

from xml.dom.ext.reader import Sax2
from xml.dom.ext import PrettyPrint
from StringIO import StringIO

def prettyPrintET(etNode):
    reader = Sax2.Reader()
    docNode = reader.fromString(ET.tostring(etNode))
    tmpStream = StringIO()
    PrettyPrint(docNode, stream=tmpStream)
    return tmpStream.getvalue()
ChristopheD
  • 112,638
  • 29
  • 165
  • 179