30

I've been using a minidom.toprettyxml for prettify my xml file. When I'm creating XML file and using this method, all works grate, but if I use it after I've modified the xml file (for examp I've added an additional nodes) and then I'm writing it back to XML, I'm getting empty lines, each time I'm updating it, I'm getting more and more empty lines...

my code :

file.write(prettify(xmlRoot))


def prettify(elem):
    rough_string = xml.tostring(elem, 'utf-8') //xml as ElementTree
    reparsed = mini.parseString(rough_string) //mini as minidom
    return reparsed.toprettyxml(indent=" ")

and the result :

<?xml version="1.0" ?>
<testsuite errors="0" failures="3" name="TestSet_2013-01-23 14_28_00.510935" skip="0"     tests="3" time="142.695" timestamp="2013-01-23 14:28:00.515460">




    <testcase classname="TC test" name="t1" status="Failed" time="27.013"/>




    <testcase classname="TC test" name="t2" status="Failed" time="78.325"/>


    <testcase classname="TC test" name="t3" status="Failed" time="37.357"/>
</testsuite>

any suggestions ?

thanks.

Jean-Francois T.
  • 11,549
  • 7
  • 68
  • 107
Igal
  • 4,603
  • 14
  • 41
  • 66

7 Answers7

33

I found a solution here: http://code.activestate.com/recipes/576750-pretty-print-xml/

Then I modified it to take a string instead of a file.

from xml.dom.minidom import parseString

pretty_print = lambda data: '\n'.join([line for line in parseString(data).toprettyxml(indent=' '*2).split('\n') if line.strip()])

Output:

<?xml version="1.0" ?>
<testsuite errors="0" failures="3" name="TestSet_2013-01-23 14_28_00.510935" skip="0" tests="3" time="142.695" timestamp="2013-01-23 14:28:00.515460">
  <testcase classname="TC test" name="t1" status="Failed" time="27.013"/>
  <testcase classname="TC test" name="t2" status="Failed" time="78.325"/>
  <testcase classname="TC test" name="t3" status="Failed" time="37.357"/>
</testsuite>

This may help you work it into your function a little be easier:

def new_prettify():
    reparsed = parseString(CONTENT)
    print '\n'.join([line for line in reparsed.toprettyxml(indent=' '*2).split('\n') if line.strip()])
Joe
  • 3,059
  • 2
  • 22
  • 28
  • Joe - unfortunately I'm getting an exception from the parser "must be string or read-only buffer, not Element" – Igal Jan 24 '13 at 08:51
  • Joe - just to make it clear ? do I need to use this code while I'm creating the xml or after it was created and it's just removing the empty lines ? Thanks. – Igal Jan 24 '13 at 09:07
7

I found an easy solution for this problem, just with changing the last line of your prettify() so it will be:

def prettify(elem):
rough_string = xml.tostring(elem, 'utf-8') //xml as ElementTree
reparsed = mini.parseString(rough_string) //mini as minidom
return reparsed.toprettyxml(indent=" ", newl='')
Sidali Smaili
  • 119
  • 2
  • 7
2

use this to resolve problem with the lines

toprettyxml(indent=' ', newl='\r', encoding="utf-8")

  • 2
    Although this may help the problem, I would recommend you to describe in more detail how your answer helps. – Wtower Jul 08 '15 at 07:49
  • `newl='\r'` does resolve the issue on Windows, may have something to do with how newlines are usually written as '\r\n' on Windows – prusswan Sep 02 '19 at 06:19
1

I am having the same issue with Python 2.7 (32b) in a Windows 10 machine. The issue seems to be that when python parses an XML text to an ElementTree object, it adds some annoying line feeds to either the "text" or "tail" attributes of each element.

This script removes such line break characters:

def removeAnnoyingLines(elem):
    hasWords = re.compile("\\w")
    for element in elem.iter():
        if not re.search(hasWords,str(element.tail)):
            element.tail=""
        if not re.search(hasWords,str(element.text)):
            element.text = ""

Use this function before "pretty-printing" your tree:

removeAnnoyingLines(element)
myXml = xml.dom.minidom.parseString(xml.etree.ElementTree.tostring(element))
print myXml.toprettyxml()

It worked for me. I hope it works for you!

Ricardo Alejos
  • 402
  • 4
  • 9
1

Here's a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.

import xml.etree.ElementTree as ET
import xml.dom.minidom
import os

def pretty_print_xml_given_root(root, output_xml):
    """
    Useful for when you are editing xml data on the fly
    """
    xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
    xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
    with open(output_xml, "w") as file_out:
        file_out.write(xml_string)

def pretty_print_xml_given_file(input_xml, output_xml):
    """
    Useful for when you want to reformat an already existing xml file
    """
    tree = ET.parse(input_xml)
    root = tree.getroot()
    pretty_print_xml_given_root(root, output_xml)

I found how to fix the common newline issue here.

Josh Correia
  • 3,807
  • 3
  • 33
  • 50
0

The problem is that minidom doesn't handle well the new line chars (on Windows). Anyway it doesn't need them so removing them from the sting is the solution:

reparsed = mini.parseString(rough_string) //mini as minidom

replace with

reparsed = mini.parseString(rough_string.replace('\n','')) //mini as minidom

But be aware that this is solution working only for Windows.

DexBG
  • 841
  • 5
  • 4
0

Since minidom toprettyxml insert too many lines, my solution was to delete lines that do not have useful data in them by checking if there is at least one '<' character (there may be a better idea). This worked perfectly for a similar issue I had (on Windows).

text = md.toprettyxml() # get the prettyxml string from minidom Document md
# text = text.replace('    ', '\t') # for those using tabs :)
spl = text.split('\n') # split lines into a list
spl = [i for i in spl if '<' in i] # keep only element with data inside
text = '\n'.join(spl) # join again all elements of the filtered list into a string

# write the result to file (I use codecs because I needed the utf-8 encoding)
import codecs # if not imported yet (just to show this import is needed)
with codecs.open('yourfile.xml', 'w', encoding='utf-8') as f:
    f.write(text)