2

I'm trying to process XML using Python's minidom, and then output the result using toprettyxml(). I ran into two problems:

  1. There are added blank lines.
  2. There are added newlines and tabs for text nodes.

Here's the code and output:

$ cat test.py
from xml.dom import minidom

dom = minidom.parse("test.xml")
print dom.toprettyxml()

$ cat test.xml
<?xml version="1.0" encoding="UTF-8"?>

<store>
    <product>
        <fruit>orange</fruit>
    </product>
</store>


$ python test.py
<?xml version="1.0" ?>
<store>


    <product>


        <fruit>
            orange
        </fruit>


    </product>


</store>

I can workaround problem 1 using strip() to remove blank lines, and I can workaround problem 2 using the hack (fixed_writexml) described in this link: http://ronrothman.com/public/leftbraned/xml-dom-minidom-toprettyxml-and-silly-whitespace/, but I was wondering if there's a better solution since the hack is almost 3 years old now. I'm open to using something other than minidom, but I'd like to avoid adding external packages like lxml.

Ravi
  • 3,718
  • 7
  • 39
  • 57
  • You may check-out my solution - http://stackoverflow.com/a/39984422/2687547 – dganesh2002 Oct 21 '16 at 22:28
  • Does this answer your question? [Empty lines while using minidom.toprettyxml](https://stackoverflow.com/questions/14479656/empty-lines-while-using-minidom-toprettyxml) – Josh Correia Feb 12 '20 at 17:38

1 Answers1

2

One solution is to patch minidom Library with the proposed patch to the bug you mention.

I haven't tested myself, a bit hacky too, so it may not suit you!

CharlesB
  • 86,532
  • 28
  • 194
  • 218
  • 2
    Thanks, I tested the patch, and it fixes both issues! Instead of directly patching minidom.py in /usr/lib/python, I did something similar to the ronrothman link above, where the function is replaced at runtime. That way, it can run anywhere. – Ravi Jun 06 '11 at 19:23
  • 3
    Hey, could you please share your solution to the patch ? thanks ! – Igal Jan 23 '13 at 14:43