When I parse the xml string below taken from a larger xml file, I run into what I think is an invalid HTML character code, the parser outputs the following error message.
The error message was: ParseError: reference to invalid character number
I deleted the rest of the body of description and left the part that caused the error. How do I get elementtree to ignore these invalid HTML character codes or process them in some way?
The code and xml excerpt is below:
XML: <dc:description> **(10ƚ)** </dc:description>
import os
import html
import io
import sys
import xml.etree.ElementTree as ET
def process_file(file):
parser=ET.XMLParser(encoding='utf-8')
tree=ET.parse(file, parser=parser)