526

What is the best way (or are the various ways) to pretty print XML in Python?

Ninjakannon
  • 3,751
  • 7
  • 53
  • 76
Hortitude
  • 13,638
  • 16
  • 58
  • 72

27 Answers27

462
import xml.dom.minidom

dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()
pj.dewitte
  • 472
  • 3
  • 10
Ben Noland
  • 34,230
  • 18
  • 50
  • 51
  • 38
    This will get you pretty xml, but note that what comes out in the text node is actually different than what came in - there are new whitespaces on text nodes. This may cause you trouble if you are expecting EXACTLY what fed in to feed out. – Todd Hopkinson Jan 12 '12 at 18:03
  • 56
    @icnivad: while it is important to point that fact, it seems strange to me that somebody would want to prettify its XML if spaces were of some importance for them ! – vaab Jan 30 '12 at 09:49
  • 19
    Nice! Can collapse this to a one liner: python -c 'import sys;import xml.dom.minidom;s=sys.stdin.read();print xml.dom.minidom.parseString(s).toprettyxml()' – Anton I. Sipos Apr 17 '12 at 22:17
  • 11
    minidom is widely panned as a pretty bad xml implementation. If you allow yourself to add external depenencies, lxml is far superior. – bukzor Apr 20 '12 at 16:34
  • 28
    Not a fan of redefining xml there from being a module to the output object, but the method otherwise works. I'd love to find a nicer way to go from the core etree to pretty printing. While lxml is cool, there are times when I'd prefer to keep to the core if I can. – Danny Staple May 01 '12 at 16:05
  • 8
    This is inserting additional whitespace every time I run it, which is broken, in my opinion. (Python 2.6 and 2.7) – Warren P Mar 27 '14 at 14:11
  • 2
    Link to the docs: https://docs.python.org/2/library/xml.dom.minidom.html#xml.dom.minidom.Node.toprettyxml – Paolo Jul 20 '14 at 18:12
  • 11
    Tons of crazy blank lines everywhere. This solution doesn't work. – void.pointer Jan 08 '15 at 03:54
  • 2
    Yeah, this is a pretty terrible solution actually. – Emil Sep 27 '15 at 18:03
  • 2
    with 2.6, there are newlines and spaces within the text nodes, but with 2.7, the text nodes appear unchanged. – Wyrmwood Apr 07 '16 at 00:21
  • Thanks @Wyrmwood, that was really helpful for me to know. – Brian C Aug 11 '16 at 23:44
  • 4
    It's worth noting that this is still some pretty butt-ugly XML: there doesn't seem to be a reliable way to prettify XML in a way that isn't butt-ugly, and by butt-ugly, I mean that all of a tags attributes are on the same line instead of being on separate lines. – Alexej Magura Apr 19 '17 at 14:40
  • Awesome work! Here's an example which takes it one step further and converts an ugly xml file to a pretty xml file: https://stackoverflow.com/a/52125645/4561887 – Gabriel Staples Sep 01 '18 at 06:49
  • 2
    `xml.toprettyxml(indent="", newl="")` this to have a 4 indent spaces and no newline everywhere. add one-liner: `python -c 'import sys, xml.dom.minidom as xmld; print(xmld.parse(sys.argv[1]).toprettyxml(indent="",newl=""))' my_xml_file.xml` – Boop Nov 06 '18 at 13:53
  • 2
    To remove the ugly empty lines it generate just add something like this to strip the blank lines: `pretty_xml_as_string = '\n'.join(list(filter(lambda x: len(x.strip()), pretty_xml_as_string.split('\n'))))` – Treviño Mar 26 '19 at 05:55
  • @Treviño I see no difference after adding the your suggestion. – posfan12 Jul 04 '19 at 05:47
  • @Wyrmwood I am running 2.7.2 and see the newline and spaces. – posfan12 Jul 04 '19 at 06:33
  • @posfan12 I'm sorry. I do not know the exact minor revision where this was fixed, but given your experience, likely it was later than 2.7.2. You can't easily install those older versions, unless you build from source, and even then it can be difficult to impossible to get setup tools and pip working previous to about 2.7.10 or so, but perhaps that's what shipped with your distro. Take the leap and go Python 3! https://pythonclock.org/ – Wyrmwood Jul 05 '19 at 16:04
  • If you can use Python 3.9, there's a new function [xml.etree.ElementTree.indent](https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.indent) that can also help address this without the need for any lxml dependencies or minidom hacks. – link_boy Aug 11 '21 at 15:03
194

lxml is recent, updated, and includes a pretty print function

import lxml.etree as etree

x = etree.parse("filename")
print etree.tostring(x, pretty_print=True)

Check out the lxml tutorial: http://lxml.de/tutorial.html

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
1729
  • 4,961
  • 2
  • 28
  • 17
  • 11
    Only downside to lxml is a dependency on external libraries. This I think is not so bad under Windows the libraries are packaged with the module. Under linux they are an `aptitude install` away. Under OS/X I'm not sure. – intuited Oct 15 '10 at 05:08
  • 4
    On OS X you just need a functioning gcc and easy_install/pip. – pkoch Feb 08 '11 at 16:09
  • 15
    lxml pretty printer isn't reliable and won't pretty print your XML properly in lots of cases explained in [lxml FAQ](http://lxml.de/FAQ.html#why-doesn-t-the-pretty-print-option-reformat-my-xml-output). I quit using lxml for pretty printing after several corner cases that just don't work (ie this won't fix: [Bug #910018](https://bugs.launchpad.net/lxml/+bug/910018)). All these problem is related to uses of XML values containing spaces that should be preserved. – vaab Jan 30 '12 at 09:57
  • This is the industry standard. – bukzor Apr 20 '12 at 16:35
  • 1
    lxml is also part of MacPorts, works smoothly for me. – Jens Dec 20 '13 at 06:00
  • 17
    Since in Python 3 you usually want to work with str (=unicode string in Python 2), better use this: `print(etree.tostring(x, pretty_print=True, encoding="unicode"))`. Writing to an output file is possible in just one line, no intermediary variable needed: `etree.parse("filename").write("outputfile", encoding="utf-8")` – Thor Feb 09 '16 at 15:05
  • 2
    `etree.XMLParser(remove_blank_text=True)` sometime can help to do the right printing – oak Oct 15 '17 at 09:02
  • lxml fails to pretty print my XML, even if I load the original document with remove_blank_text=True. – Ingo Schalk-Schupp Dec 12 '17 at 09:01
  • Like Thor, with Python 3 I had to use `etree.tostring(x, pretty_print=True, encoding="unicode")`; but I then also had to use `pretty_str = pretty_str.replace('\\n', '\n')` – cowlinator Dec 06 '19 at 22:13
  • If you can use Python 3.9, there's a new function [xml.etree.ElementTree.indent](https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.indent) that can also help without the need for any lxml dependencies. – link_boy Aug 11 '21 at 15:04
122

Another solution is to borrow this indent function, for use with the ElementTree library that's built in to Python since 2.5. Here's what that would look like:

from xml.etree import ElementTree

def indent(elem, level=0):
    i = "\n" + level*"  "
    j = "\n" + (level-1)*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent(subelem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = j
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = j
    return elem        

root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)
ade
  • 4,056
  • 1
  • 20
  • 25
  • ...and then just use lxml tostring! – Stefano Nov 21 '11 at 19:53
  • 2
    Note that you can still do `tree.write([filename])` for writing to file (`tree` being the ElementTree instance). – Bouke Jan 03 '14 at 11:32
  • No you can't since elementtree.getroot() doesn't have that method, only an elementtree object has it. @bouke – shinzou Mar 25 '18 at 10:07
  • 1
    Here's how you can write to a file: `tree = ElementTree.parse('file) ; root = tree.getroot() ; indent(root); tree.write('Out.xml');` – e-malito Feb 15 '19 at 20:22
  • @AylwynLake can anyone provide a minimal example where this code goes wrong, and the eff one goes right? Then we can start testing and put the best code here. – Ciro Santilli OurBigBook.com Apr 09 '19 at 13:52
  • This keeps bombing out on me, I get `TypeError: object of type 'ElementTree' has no len()` when I run this. If I use `tree.write` before passing tree to `indent`, I get an xml file with everything on one line. – Korzak Apr 21 '20 at 21:17
  • Links to the correct code are dead, but you can still find the answer in this question's edit history – Hugo Burd Mar 03 '21 at 01:54
81

You have a few options.

xml.etree.ElementTree.indent()

Batteries included, simple to use, pretty output.

But requires Python 3.9+

import xml.etree.ElementTree as ET

element = ET.XML("<html><body>text</body></html>")
ET.indent(element)
print(ET.tostring(element, encoding='unicode'))

BeautifulSoup.prettify()

BeautifulSoup may be the simplest solution for Python < 3.9.

from bs4 import BeautifulSoup

bs = BeautifulSoup(open(xml_file), 'xml')
pretty_xml = bs.prettify()
print(pretty_xml)

Output:

<?xml version="1.0" encoding="utf-8"?>
<issues>
 <issue>
  <id>
   1
  </id>
  <title>
   Add Visual Studio 2005 and 2008 solution files
  </title>
 </issue>
</issues>

This is my goto answer. The default arguments work as is. But text contents are spread out on separate lines as if they were nested elements.

lxml.etree.parse()

Prettier output but with arguments.

from lxml import etree

x = etree.parse(FILE_NAME)
pretty_xml = etree.tostring(x, pretty_print=True, encoding=str)

Produces:

  <issues>
    <issue>
      <id>1</id>
      <title>Add Visual Studio 2005 and 2008 solution files</title>
      <details>We need Visual Studio 2005/2008 project files for Windows.</details>
    </issue>
  </issues>

This works for me with no issues.


xml.dom.minidom.parse()

No external dependencies but post-processing.

import xml.dom.minidom as md

dom = md.parse(FILE_NAME)     
# To parse string instead use: dom = md.parseString(xml_string)
pretty_xml = dom.toprettyxml()
# remove the weird newline issue:
pretty_xml = os.linesep.join([s for s in pretty_xml.splitlines()
                              if s.strip()])

The output is the same as above, but it's more code.

ChaimG
  • 7,024
  • 4
  • 38
  • 46
49

Here's my (hacky?) solution to get around the ugly text node problem.

uglyXml = doc.toprettyxml(indent='  ')

text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)    
prettyXml = text_re.sub('>\g<1></', uglyXml)

print prettyXml

The above code will produce:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>1</id>
    <title>Add Visual Studio 2005 and 2008 solution files</title>
    <details>We need Visual Studio 2005/2008 project files for Windows.</details>
  </issue>
</issues>

Instead of this:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>
      1
    </id>
    <title>
      Add Visual Studio 2005 and 2008 solution files
    </title>
    <details>
      We need Visual Studio 2005/2008 project files for Windows.
    </details>
  </issue>
</issues>

Disclaimer: There are probably some limitations.

Nick Bolton
  • 38,276
  • 70
  • 174
  • 242
  • Thank you! This was my one gripe with all the pretty printing methods. Works well with the few files I tried. – iano Sep 17 '10 at 17:00
  • I found a pretty 'almost identical' solution, but yours is more direct, using `re.compile` prior to `sub` operation (I was using `re.findall()` twice, `zip` and a `for` loop with `str.replace()`...) – heltonbiker Sep 16 '11 at 20:49
  • 4
    This is no longer necessary in Python 2.7: xml.dom.minidom's toprettyxml() now produces output like '1' by default, for nodes that have exactly one text child node. – Marius Gedminas Jul 12 '13 at 14:00
  • I am compelled to use Python 2.6. So, this regex reformatting trick is very useful. Worked as-is with no problems. – Mike Finch Jan 18 '17 at 21:46
  • @Marius Gedminas I am running 2.7.2 and the "default" is definitely not as you say. – posfan12 Jul 04 '19 at 06:40
  • (FYI, `doc` is a `xml.dom.minidom.Document` object) – cowlinator Dec 06 '19 at 21:58
  • @Nick Bolton : Can we increase the spacing/indentation ? – StackGuru Apr 14 '20 at 10:23
27

As of Python 3.9, ElementTree has an indent() function for pretty-printing XML trees.

See https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.indent.

Sample usage:

import xml.etree.ElementTree as ET

element = ET.XML("<html><body>text</body></html>")
ET.indent(element)
print(ET.tostring(element, encoding='unicode'))

The upside is that it does not require any additional libraries. For more information check https://bugs.python.org/issue14465 and https://github.com/python/cpython/pull/15200

mzjn
  • 48,958
  • 13
  • 128
  • 248
o15a3d4l11s2
  • 3,969
  • 3
  • 29
  • 40
23

As others pointed out, lxml has a pretty printer built in.

Be aware though that by default it changes CDATA sections to normal text, which can have nasty results.

Here's a Python function that preserves the input file and only changes the indentation (notice the strip_cdata=False). Furthermore it makes sure the output uses UTF-8 as encoding instead of the default ASCII (notice the encoding='utf-8'):

from lxml import etree

def prettyPrintXml(xmlFilePathToPrettyPrint):
    assert xmlFilePathToPrettyPrint is not None
    parser = etree.XMLParser(resolve_entities=False, strip_cdata=False)
    document = etree.parse(xmlFilePathToPrettyPrint, parser)
    document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')

Example usage:

prettyPrintXml('some_folder/some_file.xml')
roskakori
  • 3,139
  • 1
  • 30
  • 29
13

If you have xmllint you can spawn a subprocess and use it. xmllint --format <file> pretty-prints its input XML to standard output.

Note that this method uses an program external to python, which makes it sort of a hack.

def pretty_print_xml(xml):
    proc = subprocess.Popen(
        ['xmllint', '--format', '/dev/stdin'],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
    )
    (output, error_output) = proc.communicate(xml);
    return output

print(pretty_print_xml(data))
Russell Silva
  • 2,772
  • 3
  • 26
  • 36
11

I tried to edit "ade"s answer above, but Stack Overflow wouldn't let me edit after I had initially provided feedback anonymously. This is a less buggy version of the function to pretty-print an ElementTree.

def indent(elem, level=0, more_sibs=False):
    i = "\n"
    if level:
        i += (level-1) * '  '
    num_kids = len(elem)
    if num_kids:
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
            if level:
                elem.text += '  '
        count = 0
        for kid in elem:
            indent(kid, level+1, count < num_kids - 1)
            count += 1
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
            if more_sibs:
                elem.tail += '  '
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i
            if more_sibs:
                elem.tail += '  '
Rónán Daly
  • 173
  • 4
Joshua Richardson
  • 1,827
  • 22
  • 22
10

If you're using a DOM implementation, each has their own form of pretty-printing built-in:

# minidom
#
document.toprettyxml()

# 4DOM
#
xml.dom.ext.PrettyPrint(document, stream)

# pxdom (or other DOM Level 3 LS-compliant imp)
#
serializer.domConfig.setParameter('format-pretty-print', True)
serializer.writeToString(document)

If you're using something else without its own pretty-printer — or those pretty-printers don't quite do it the way you want —  you'd probably have to write or subclass your own serialiser.

bobince
  • 528,062
  • 107
  • 651
  • 834
8

I had some problems with minidom's pretty print. I'd get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried doc.toprettyxml(encoding='latin-1'). Here's my workaround for it:

def toprettyxml(doc, encoding):
    """Return a pretty-printed XML document in a given encoding."""
    unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
                          u'<?xml version="1.0" encoding="%s"?>' % encoding)
    return unistr.encode(encoding, 'xmlcharrefreplace')
giltay
  • 1,995
  • 1
  • 11
  • 5
6
from yattag import indent

pretty_string = indent(ugly_string)

It won't add spaces or newlines inside text nodes, unless you ask for it with:

indent(mystring, indent_text = True)

You can specify what the indentation unit should be and what the newline should look like.

pretty_xml_string = indent(
    ugly_xml_string,
    indentation = '    ',
    newline = '\r\n'
)

The doc is on http://www.yattag.org homepage.

John Smith Optional
  • 22,259
  • 12
  • 43
  • 61
6

I wrote a solution to walk through an existing ElementTree and use text/tail to indent it as one typically expects.

def prettify(element, indent='  '):
    queue = [(0, element)]  # (level, element)
    while queue:
        level, element = queue.pop(0)
        children = [(level + 1, child) for child in list(element)]
        if children:
            element.text = '\n' + indent * (level+1)  # for child open
        if queue:
            element.tail = '\n' + indent * queue[0][0]  # for sibling open
        else:
            element.tail = '\n' + indent * (level-1)  # for parent close
        queue[0:0] = children  # prepend so children come before siblings
nacitar sevaht
  • 510
  • 5
  • 15
4

Here's a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.

import xml.etree.ElementTree as ET
import xml.dom.minidom
import os

def pretty_print_xml_given_root(root, output_xml):
    """
    Useful for when you are editing xml data on the fly
    """
    xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
    xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
    with open(output_xml, "w") as file_out:
        file_out.write(xml_string)

def pretty_print_xml_given_file(input_xml, output_xml):
    """
    Useful for when you want to reformat an already existing xml file
    """
    tree = ET.parse(input_xml)
    root = tree.getroot()
    pretty_print_xml_given_root(root, output_xml)

I found how to fix the common newline issue here.

Josh Correia
  • 3,807
  • 3
  • 33
  • 50
3

You can use popular external library xmltodict, with unparse and pretty=True you will get best result:

xmltodict.unparse(
    xmltodict.parse(my_xml), full_document=False, pretty=True)

full_document=False against <?xml version="1.0" encoding="UTF-8"?> at the top.

Vitaly Zdanevich
  • 13,032
  • 8
  • 47
  • 81
3

XML pretty print for python looks pretty good for this task. (Appropriately named, too.)

An alternative is to use pyXML, which has a PrettyPrint function.

Dan Lew
  • 85,990
  • 32
  • 182
  • 176
  • `HTTPError: 404 Client Error: Not Found for url: https://pypi.org/simple/xmlpp/` Think that project is in the attic nowadays, shame. – 8bitjunkie Jun 04 '20 at 17:01
  • Obsolete answer, dead links. PyXML was abandoned years ago. – mzjn Sep 01 '23 at 10:30
2

Take a look at the vkbeautify module.

It is a python version of my very popular javascript/nodejs plugin with the same name. It can pretty-print/minify XML, JSON and CSS text. Input and output can be string/file in any combinations. It is very compact and doesn't have any dependency.

Examples:

import vkbeautify as vkb

vkb.xml(text)                       
vkb.xml(text, 'path/to/dest/file')  
vkb.xml('path/to/src/file')        
vkb.xml('path/to/src/file', 'path/to/dest/file') 
vadimk
  • 1,461
  • 15
  • 11
2

You can try this variation...

Install BeautifulSoup and the backend lxml (parser) libraries:

user$ pip3 install lxml bs4

Process your XML document:

from bs4 import BeautifulSoup

with open('/path/to/file.xml', 'r') as doc: 
    for line in doc: 
        print(BeautifulSoup(line, 'lxml-xml').prettify())  
NYCeyes
  • 5,215
  • 6
  • 57
  • 64
  • 1
    `'lxml'` uses lxml's *HTML* parser - see the BS4 [docs](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser). You need `'xml'` or `'lxml-xml'` for the XML parser. – user2357112 Nov 24 '19 at 08:43
  • 1
    This comment keeps getting deleted. Again, I've enter a formal complaint (in addition to) 4-flags) of post tampering with StackOverflow, and will not stop until this is forensically investigated by a security team (access logs and version histories). The above timestamp is wrong (by years) and likely the content, too. – NYCeyes Nov 26 '19 at 19:52
  • 1
    This worked fine for me, unsure of the down vote from the docs `lxml’s XML parser BeautifulSoup(markup, "lxml-xml") BeautifulSoup(markup, "xml")` – Umar.H Jan 20 '20 at 11:41
  • 1
    @Datanovice I'm glad it helped you. :) As for the suspect downvote, someone tampered with my original answer (which correctly originally specified `lxml-xml`), and then they proceeded to downvote it that same day. I submitted an official complaint to S/O but they refused to investigate. Anyway, I have since "de-tampered" my answer, which is now correct again (and specifies `lxml-xml` as it originally did). Thank you. – NYCeyes Jan 20 '20 at 20:19
2

I found this question while looking for "how to pretty print html"

Using some of the ideas in this thread I adapted the XML solutions to work for XML or HTML:

from xml.dom.minidom import parseString as string_to_dom

def prettify(string, html=True):
    dom = string_to_dom(string)
    ugly = dom.toprettyxml(indent="  ")
    split = list(filter(lambda x: len(x.strip()), ugly.split('\n')))
    if html:
        split = split[1:]
    pretty = '\n'.join(split)
    return pretty

def pretty_print(html):
    print(prettify(html))

When used this is what it looks like:

html = """\
<div class="foo" id="bar"><p>'IDK!'</p><br/><div class='baz'><div>
<span>Hi</span></div></div><p id='blarg'>Try for 2</p>
<div class='baz'>Oh No!</div></div>
"""

pretty_print(html)

Which returns:

<div class="foo" id="bar">
  <p>'IDK!'</p>
  <br/>
  <div class="baz">
    <div>
      <span>Hi</span>
    </div>
  </div>
  <p id="blarg">Try for 2</p>
  <div class="baz">Oh No!</div>
</div>
emehex
  • 9,874
  • 10
  • 54
  • 100
1

An alternative if you don't want to have to reparse, there is the xmlpp.py library with the get_pprint() function. It worked nice and smoothly for my use cases, without having to reparse to an lxml ElementTree object.

gaborous
  • 15,832
  • 10
  • 83
  • 102
  • 1
    Tried minidom and lxml and didn't get a properly formatted and indented xml. This worked as expected – david-hoze Aug 08 '17 at 10:23
  • 1
    Fails for tag names that are prefixed by a namespace and contain a hyphen (e.g. ; the part starting with the hyphen is simply dropped, giving e.g. . – Endre Both Nov 12 '18 at 15:06
  • @EndreBoth Nice catch, I did not test, but maybe it would be easy to fix this in the xmlpp.py code? – gaborous Nov 12 '18 at 20:45
1

I found a fast and easy way to nicely format and print an xml file:

import xml.etree.ElementTree as ET

xmlTree = ET.parse('your XML file')
xmlRoot = xmlTree.getroot()
xmlDoc =  ET.tostring(xmlRoot, encoding="unicode")

print(xmlDoc)

Outuput:

<root>
  <child>
    <subchild>.....</subchild>
  </child>
  <child>
    <subchild>.....</subchild>
  </child>
  ...
  ...
  ...
  <child>
    <subchild>.....</subchild>
  </child>
</root>
Wuagliono
  • 162
  • 1
  • 11
0

I had this problem and solved it like this:

def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
    pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
    if pretty_print: pretty_printed_xml = pretty_printed_xml.replace('  ', indent)
    file.write(pretty_printed_xml)

In my code this method is called like this:

try:
    with open(file_path, 'w') as file:
        file.write('<?xml version="1.0" encoding="utf-8" ?>')

        # create some xml content using etree ...

        xml_parser = XMLParser()
        xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')

except IOError:
    print("Error while writing in log file!")

This works only because etree by default uses two spaces to indent, which I don't find very much emphasizing the indentation and therefore not pretty. I couldn't ind any setting for etree or parameter for any function to change the standard etree indent. I like how easy it is to use etree, but this was really annoying me.

Zelphir Kaltstahl
  • 5,722
  • 10
  • 57
  • 86
0

For converting an entire xml document to a pretty xml document
(ex: assuming you've extracted [unzipped] a LibreOffice Writer .odt or .ods file, and you want to convert the ugly "content.xml" file to a pretty one for automated git version control and git difftooling of .odt/.ods files, such as I'm implementing here)

import xml.dom.minidom

file = open("./content.xml", 'r')
xml_string = file.read()
file.close()

parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()

file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()

References:
- Thanks to Ben Noland's answer on this page which got me most of the way there.

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
0
from lxml import etree
import xml.dom.minidom as mmd

xml_root = etree.parse(xml_fiel_path, etree.XMLParser())

def print_xml(xml_root):
    plain_xml = etree.tostring(xml_root).decode('utf-8')
    urgly_xml = ''.join(plain_xml .split())
    good_xml = mmd.parseString(urgly_xml)
    print(good_xml.toprettyxml(indent='    ',))

It's working well for the xml with Chinese!

Reed_Xia
  • 1,300
  • 3
  • 17
  • 29
0

If for some reason you can't get your hands on any of the Python modules that other users mentioned, I suggest the following solution for Python 2.7:

import subprocess

def makePretty(filepath):
  cmd = "xmllint --format " + filepath
  prettyXML = subprocess.check_output(cmd, shell = True)
  with open(filepath, "w") as outfile:
    outfile.write(prettyXML)

As far as I know, this solution will work on Unix-based systems that have the xmllint package installed.

FriskySaga
  • 409
  • 1
  • 8
  • 19
  • xmllint has already been suggested in another answer: https://stackoverflow.com/a/10133365/407651 – mzjn May 14 '20 at 06:35
  • @mzjn I saw the answer, but I simplified mine down to `check_output` because you don't need to do error checking – FriskySaga May 14 '20 at 16:01
0

Use etree.indent and etree.tostring

import lxml.etree as etree

root = etree.fromstring('<html><head></head><body><h1>Welcome</h1></body></html>')
etree.indent(root, space="  ")
xml_string = etree.tostring(root, pretty_print=True).decode()
print(xml_string)

output

<html>
  <head/>
  <body>
    <h1>Welcome</h1>
  </body>
</html>

Removing namespaces and prefixes

import lxml.etree as etree


def dump_xml(element):
    for item in element.getiterator():
        item.tag = etree.QName(item).localname

    etree.cleanup_namespaces(element)
    etree.indent(element, space="  ")
    result = etree.tostring(element, pretty_print=True).decode()
    return result


root = etree.fromstring('<cs:document xmlns:cs="http://blabla.com"><name>hello world</name></cs:document>')
xml_string = dump_xml(root)
print(xml_string)

output

<document>
  <name>hello world</name>
</document>
Jossef Harush Kadouri
  • 32,361
  • 10
  • 130
  • 129
-2

I solved this with some lines of code, opening the file, going trough it and adding indentation, then saving it again. I was working with small xml files, and did not want to add dependencies, or more libraries to install for the user. Anyway, here is what I ended up with:

    f = open(file_name,'r')
    xml = f.read()
    f.close()

    #Removing old indendations
    raw_xml = ''        
    for line in xml:
        raw_xml += line

    xml = raw_xml

    new_xml = ''
    indent = '    '
    deepness = 0

    for i in range((len(xml))):

        new_xml += xml[i]   
        if(i<len(xml)-3):

            simpleSplit = xml[i:(i+2)] == '><'
            advancSplit = xml[i:(i+3)] == '></'        
            end = xml[i:(i+2)] == '/>'    
            start = xml[i] == '<'

            if(advancSplit):
                deepness += -1
                new_xml += '\n' + indent*deepness
                simpleSplit = False
                deepness += -1
            if(simpleSplit):
                new_xml += '\n' + indent*deepness
            if(start):
                deepness += 1
            if(end):
                deepness += -1

    f = open(file_name,'w')
    f.write(new_xml)
    f.close()

It works for me, perhaps someone will have some use of it :)

Petter TB
  • 147
  • 5