1

I have some code that is throwing a fatal exception on unicode input. I am using ElementTree to build up an xml document and tostring() to print it. I've tried passing in unicode objects, and encoding them as UTF-8 bytestrings, and it makes no difference. I can't figure out if I'm doing something wrong or if there is a bug in the module.

Here is a small sample to reproduce it.

#!/usr/bin/python

from __future__ import unicode_literals, print_function
from xml.etree.ElementTree import Element, SubElement, tostring
import xml.etree.ElementTree as ET
import time

def main():
    xml = Element('build_summary')
    mpversion = SubElement(xml, 'magpy_version')
    mpversion.text = '1.2.3.4'
    version = SubElement(xml, 'version')
    version.text = '11.22.33.44'
    date = SubElement(xml, 'date')
    date.text = time.strftime("%a %b %-d %Y", time.localtime())
    args = SubElement(xml, 'args')
    args.text = 'build args'
    issues = SubElement(xml, 'issues')
    # Add the repos and the changes in them
    changelog = 'this is the changelog \u2615'
    #changelog = 'this is the changelog'
    print("adding changelog:", changelog)
    repository = SubElement(issues, 'repo')
    reponame = SubElement(repository, 'reponame')
    reponame.text = 'repo name'
    repoissues = SubElement(repository, 'repoissues')
    #repoissues.text = changelog.encode('UTF-8', 'replace')
    repoissues.text = changelog

    # Generate a string, reparse it, and pretty-print it.
    #ET.dump(xml)
    #xml.write('myoutput.xml')
    rough = tostring(xml, encoding='UTF-8', method='xml')
    #rough = tostring(xml)
    print(rough)

if __name__ == '__main__':
    main()

This yields the following:

msoulier@anton:~$ python treetest.py
adding changelog: this is the changelog ☕
Traceback (most recent call last):
File "treetest.py", line 38, in <module>
    main()
File "treetest.py", line 33, in main
    rough = tostring(xml, encoding='UTF-8', method='xml')
File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1127, in tostring
    return "".join(data)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 22: ordinal not in range(128)

So, what am I doing wrong here? Oddly, ElementTree.dump works fine but the docs say not to use it for anything bug debugging.

mzjn
  • 48,958
  • 13
  • 128
  • 248
Michael Soulier
  • 803
  • 1
  • 9
  • 20
  • There is a problem with the import of `unicode_literals`. If that import is removed, it works. See https://stackoverflow.com/q/809796/407651. – mzjn Oct 06 '19 at 14:49
  • Well it's too late to back that out. For now I guess I'm stripping non-ascii before this parser is hit. – Michael Soulier Oct 07 '19 at 13:49

0 Answers0