I'm trying to iterate through an XML file (UTF-8 encoded, starts with ) with lxml, but get the following error on the character 丂 :
UnicodeEncodeError: 'cp932' codec can't encode character u'\u4e02' in position 0: illegal multibyte sequence
Other characters before this are printed out correctly. The code is:
parser = etree.XMLParser(encoding='utf-8')
tree = etree.parse("filename.xml", parser)
root = tree.getroot()
for elem in root:
print elem[0].text
Does the error mean that it didn't parse the file in utf-8 but in shift JIS instead?