Python minidom unwanted whitespace

Question

I'm using Python to write data into .xml files. I have this file named statistics.xml and everytime I call my method 'writeIntoXml()' it should add data to that statistics xml-file. Now Python does this perfectly, the only problem is it adds unwanted whitespace between all of my elements that were in the file before I wrote the new data into it. Like this:

<AantalTicketsPerUur>
    <Dag datum="2012-03-16">
        <Aantal_tickets Aantal="24" uurinterval="0u-1u"/>
        <Aantal_tickets Aantal="68" uurinterval="1u-2u"/>
        <Aantal_tickets Aantal="112" uurinterval="2u-3u"/>
        <Aantal_tickets Aantal="98" uurinterval="3u-4u"/>
    </Dag>
</AantalTicketsPerUur>

becomes this (the elements without that whitespace inbetween are the new data):

<AantalTicketsPerUur>


    <Dag datum="2012-03-16">


        <Aantal_tickets Aantal="24" uurinterval="0u-1u"/>


        <Aantal_tickets Aantal="68" uurinterval="1u-2u"/>


        <Aantal_tickets Aantal="112" uurinterval="2u-3u"/>


        <Aantal_tickets Aantal="98" uurinterval="3u-4u"/>


    </Dag>


    <Dag datum="2012-03-16">
        <Aantal_tickets Aantal="24" uurinterval="0u-1u"/>
        <Aantal_tickets Aantal="68" uurinterval="1u-2u"/>
        <Aantal_tickets Aantal="112" uurinterval="2u-3u"/>
        <Aantal_tickets Aantal="98" uurinterval="3u-4u"/>
    </Dag>
</AantalTicketsPerUur>

How can I solve this? NOTE: I DO USE THE .toprettyxml() method

Thanks in advance

Does this answer your question? [Empty lines while using minidom.toprettyxml](https://stackoverflow.com/questions/14479656/empty-lines-while-using-minidom-toprettyxml) — Josh Correia, Feb 12 '20 at 17:35

unknown user · Answer 1 · 2012-11-06T11:58:24.317

2

You might want to use toxml instead of toprettyxml which does not modify the format:

def write_xml(filename, dom):
    f = open(filename, "w")
    f.write(dom.toxml("utf-8"))
    f.close()

edited Nov 06 '12 at 11:58

answered Nov 06 '12 at 11:04

unknown user

57
8

score 0 · Answer 2 · answered Feb 01 '14 at 21:54

I agree with the answer from qgi. But note that the two methods seem to have opposite quirks with regard to comments found OUTSIDE the root element. For example, if I parse this XML file with minidom...

<?xml version="1.0" encoding="utf-8"?>

<!-- testing 1 -->
<!-- testing 2 -->

<sources autodelete="false" syncmedia="true" multivalue_separator=";; ">

    <!-- testing 3 -->
    <source 
        id_field="Lex GUID"
        source_audio_folder="samples/audio"
        source_image_folder="samples/pictures" >
        <source_field anki_field="Lex GUID" />
  </source>

    <!-- Test blah blah
        blah blah 
        blah 
    -->
    <source 
        id_field="Example"  
        source_audio_folder="samples/audio"
        source_image_folder="samples/pictures" >
        <source_field anki_field="Example" />

    </source>

</sources>

<!-- test THE END -->

...and then I save it as two different files, toxml preserves those outer parts nicely (but nothing inside the root), and toprettyxml preserves ONLY the parts inside the root. I'm using Python 2.7 BTW. Here is tmp1.xml ('pretty'):

<?xml version="1.0" encoding="utf-8"?><!-- testing 1 --><!-- testing 2 --><sources autodelete="false" multivalue_separator=";; " syncmedia="true">

    <!-- testing 3 -->
    <source id_field="Lex GUID" source_audio_folder="samples/audio" source_image_folder="samples/pictures">
        <source_field anki_field="Lex GUID"/>
  </source>

    <!-- Test blah blah
        blah blah 
        blah 
    -->
    <source id_field="Example" source_audio_folder="samples/audio" source_image_folder="samples/pictures">
        <source_field anki_field="Example"/>

    </source>

</sources><!-- test THE END -->

...and here is tmp2.xml (plain tostring):

<?xml version="1.0" encoding="utf-8"?>
<!-- testing 1 -->
<!-- testing 2 -->
<sources autodelete="false" multivalue_separator=";; " syncmedia="true">



    <!-- testing 3 -->


    <source id_field="Lex GUID" source_audio_folder="samples/audio" source_image_folder="samples/pictures">


        <source_field anki_field="Lex GUID"/>


    </source>



    <!-- Test blah blah
        blah blah 
        blah 
    -->


    <source id_field="Example" source_audio_folder="samples/audio" source_image_folder="samples/pictures">


        <source_field anki_field="Example"/>



    </source>



</sources>
<!-- test THE END -->

Just in case, here is the Python code that produced those:

import xml.dom.minidom as minidom
tree = minidom.parse(file_path)
s1 = tree.toxml('utf-8')
s2 = tree.toprettyxml('    ', '\n', 'utf-8')
with open ('tmp1.xml', mode='w') as outfile:  # Python 3 would also allow: encoding='utf-8'
    outfile.write(s1.encode('utf-8'))
with open ('tmp2.xml', mode='w') as outfile:
    outfile.write(s2.encode('utf-8'))

And both methods seem to produce newlines inconsistently (sometimes as CR and sometimes as CR LF; a.k.a \r or \r\n). The good news is that output from toprettyxml() can be read back in and then saved back out with *identical* formatting, which would be great if I could live with the 3-4 blank lines it wants between each element — Jon Coombs, Feb 01 '14 at 22:08

Python minidom unwanted whitespace

2 Answers2