0

Due to '&' in XML files, parsing XML files fails. The code is simple as shown below.

xmlparse = Xet.parse(input_file_path + file_name, parser=Xet.XMLParser(encoding="utf-8"))

I found '&' in the XML files causing the below error.

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1839, column 1016

and I can find the '&' character in the indicated line and column.

what should I do to fix this problem?

Tommy
  • 61
  • 7

1 Answers1

1

It definitely makes XML files not well-formed.

There are two ways to handle it:

  • Use CData section.
  • Entitized ampersand character.

The 2nd method is easier and requires replacement of & with its entity &

For example, <city>dog & pony</city> should become <city>dog &amp; pony</city>

Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21