14

I'm using Python's xml.etree.ElementTree to do some XML parsing on a file. However, I get this error mid-way through the document:

xml.parsers.expat.ExpatError: not well-formed (invalid token): line X, column Y

So I go to line X, column Y in vim and I see an ampersand (&) with red background highlighting. What does this mean?

Also the two characters preceding it are >>, so maybe there's something special about >>&?

Anyone know how to fix this?

JDS
  • 16,388
  • 47
  • 161
  • 224
  • I received this error message because my input file had a UTF-8 Byte order mark. Stripping the first three bytes off the input resolved the issue. (Obviously not applicable to the OP's question, but this page was still the first search hit, so perhaps this is helpful to a future visitor.) – Kilian Brendel Aug 20 '20 at 12:32

3 Answers3

18

The & is a special character in XML, used for character entities. If your XML has & sitting there by itself, not as part of an entity like & or ѐ or the like, then the XML is invalid.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • I think the issue might be that I have a multi-line (string) element. Essentially for this one element I did a grep (regex) | head -5, to get back 5 lines, then stuck it in the file as an xml element. Would I be better off making 5 separate elements somehow? – JDS Jul 11 '12 at 23:31
  • It's not a matter of how many elements there are in it, it's a matter of what characters are in it. You just can't put the & character in an XML document by itself. You need to escape it by replacing it with `&`. – BrenBarn Jul 11 '12 at 23:34
  • some text & that character is no good you're saying? Also I'm reading in these lines from many different files, so I'm not sure how I could automatically escape them (read in from a bash script using grep and then outputted to a file) – JDS Jul 11 '12 at 23:36
  • 2
    Right, that is invalid XML. Somehow or other you need to escape the & if you want it in the file. You will encounter similar problems if your data contains `<` or `>` characters. One possibility is to just do a simple replace of & with &. If that doesn't meet your needs you'll need to google around for info on how to handle XML escaping. – BrenBarn Jul 11 '12 at 23:45
  • Thanks for telling me about this stuff – JDS Jul 11 '12 at 23:45
0

You can use the escape function found in the xml module

from xml.sax.saxutils import escape

my_string = "Some string with an &"

# If the string contains &, <, or > they will be converted.
print(escape(my_string))

# Above will return: Some string with an &amp;

Reference: Escaping strings for use in XML

captam3rica
  • 191
  • 1
  • 5
-1

I solve it by using yattag instead

from yattag import indent
print indent(xml_string.encode('utf-8'))
Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108