Given an XML with chinese characters, I would like to use xml.etree
to help me parse the XML to do some processing. The English version works. For example:
>el.xml printf '%s\n' $'<?xml version=\'1.0\' encoding=\'utf8\'?><Color>Grey</Color>'
>cl.xml printf '%s\n' $'<?xml version=\'1.0\' encoding=\'utf8\'?><Color>灰色</Color>'
tryParse() {
python -c 'import xml.etree.ElementTree as ET; import sys; ET.parse(sys.argv[1])' "$@"
}
tryParse el.xml && printf '%s\n\n' "English works"
tryParse cl.xml && printf '%s\n\n' "Chinese works"
...emits as output:
English works
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1182, in parse
tree.parse(source, parser)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
parser.feed(data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 44