0

I have following xml:

<Earth>
 <country name="Česká republika" population="8900000">
    <capital>Praha1</capital>        
  </country>
</Earth>

But when I try to parse it fails with error:

 xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 2, column 20

my code:

 tree=etree.parse(input) # input -> file.xml
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
Johnzzz
  • 119
  • 1
  • 4
  • 11
  • What is the encoding of the XML file? You need to make sure you decode to Unicode. – mechanical_meat Apr 25 '12 at 22:54
  • very similar to http://stackoverflow.com/questions/147741/character-reading-from-file-in-python try opening the with the right encoding – arhimmel Apr 25 '12 at 23:24
  • yep, but I'm not opening it in any way, I just use the filename as parameter of ET.parse, That's the reason why I can't manually set the encoding (or I'm not aware of ET method, that could do so) – Johnzzz Apr 26 '12 at 16:35

1 Answers1

1

As arhimmel pointed out, the issue is likely an encoding issue. etree.parse allows passing file-like objects as well as paths, so you could try adding import codecs at the top of your code and then replacing input with codecs.open("file.xml", encoding="UTF-8").

javawizard
  • 1,277
  • 1
  • 12
  • 17