0

I'm trying to build an XML file using Python with ElementTree, then pretty-printing it using minidom following this snippet. The issue I'm facing is that when I generate a SubElement, if the string contains quotes, it gets escaped to ".

The code is pretty simple:

from xml.etree import ElementTree as et
from xml.dom import minidom

def prettify(elem):
    """Return a pretty-printed XML string for the Element."""
    rough_string = et.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="\t")

top = et.Element("Base")
sub = et.SubElement(top, "Sub")
sub.text = 'Hello this is "a test" with quotes'

print(prettify(top))

And the generated XML is:

<?xml version="1.0" ?>
<Base>
  <Sub>Hello this is &quot;a test&quot; with quotes</Sub>
</Base>

Is there a way of avoiding escaping the quotes?

José Tomás Tocino
  • 9,873
  • 5
  • 44
  • 78
  • Looking at the ElementTree code I think the answer is negative. See https://github.com/python/cpython/blob/3.8/Lib/xml/etree/ElementTree.py#L1073 – balderman May 04 '20 at 12:46
  • It can be done with lxml, if you can use it. – Jack Fleeting May 04 '20 at 14:15
  • 1
    I cannot reproduce this with Python 3.8. Do you write the generated XML to a file? – mzjn May 04 '20 at 16:26
  • writing the Sub-Elements contents as CDATA might help, cf. https://stackoverflow.com/questions/174890/how-to-output-cdata-using-elementtree – Benjamin W. Bohl May 04 '20 at 21:56
  • I'm afraid I cannot use lxml, I have to stick to standard libraries. I'm using Python 3.4, due to the operating system being a RHEL 7.x version. The output in the question is from printing the xml directly to stdout. – José Tomás Tocino May 04 '20 at 23:25
  • 1
    I cannot reproduce the problem with Python 2.7 either. What exaclly does "printing the xml directly to stdout"? mean? Do you use `et.dump(top)`, `print(et.tostring(top))`, or what? – mzjn May 05 '20 at 06:11
  • You're right @mzjn the issue is actually happening in the way I print the xml. I was using minidom to prettify the XML before printing it, sorry for omitting that. Looks like the escaping is added by minidom. – José Tomás Tocino May 05 '20 at 07:41
  • And that, therefore, directly turns this question into a literal duplicate of: https://stackoverflow.com/questions/41145809/xml-toprettyxml-escapes-quotes – José Tomás Tocino May 05 '20 at 07:45
  • 1
    Here is the minidom code that does the escaping: https://github.com/python/cpython/blob/3.8/Lib/xml/dom/minidom.py#L306 – mzjn May 05 '20 at 08:11
  • 1
    Related bug report: https://bugs.python.org/issue37374 – mzjn May 05 '20 at 09:41

0 Answers0