I'm trying to save an XML file encoded as UTF-16 with cElementTree. This is the same project, but different than the DOCTYPE issue in: How to create <!DOCTYPE> with Python's cElementTree
I've learned that if I do not declare the encoding in the string, cElementTree will add it. So, the code is like this:
import xml.etree.cElementTree as ElementTree
from StringIO import StringIO
s = '<?xml version=\"1.0\" ?><!DOCTYPE tmx SYSTEM \"tmx14a.dtd\" ><tmx version=\"1.4a\" />'
tree = ElementTree.parse(StringIO(s)).getroot()
header = ElementTree.SubElement(tree,'header',{'adminlang': 'EN',})
body = ElementTree.SubElement(tree,'body')
ElementTree.ElementTree(tree).write('myfile.tmx','UTF-16')
When I write the file with UTF-8, everthing's great. However, when I change to UTF-16, the text encoding is corrupted. It is also missing the required Byte Order Marker. When I try adding the BOM to the start of the string,
s = '\xFF\xFE<?xml version=\"1.0\"......
ElementTree reports the error "not well-formed (invalid token) line 1, column 1".
All the buffers are unicode data. How can I save to a UTF-16 XML file?