I have thousands of XML files like follow
<names>
<Id>1518845</Id>
<Name>Confessions of a Thug (Paperback)</Name>
<Authors>Philip Meadows Taylor</Authors>
<Publisher>Rupa & Co</Publisher>
<CountsOfReview>2.0</CountsOfReview>
</names>
I've tried the codes follow to parse
from lxml import etree
root = etree.parse("xm_file.xml")
import xml.etree.ElementTree as ET
tree = ET.parse("xm_file.xml")
and
parser = ET.XMLParser(encoding="utf-8")
tree = ET.parse("xm_file.xml", parser=parser)
and all lead to one of those errors
ParseError: not well-formed (invalid token): line 10, column 18
XMLSyntaxError: xmlParseEntityRef: no name, line 10, column 19
I searched and tried a lot for a solution for this to work to all files but in vain
NOTE : this didnt help me : How to parse invalid (bad / not well-formed) XML?
another situation is
<names>
<Id>1481744</Id>
<Name>Lettres de René-Édouard Claparède <1832-1871>.: Choisies et annotées</Name>
<Authors>René-Édouard Claparède</Authors>
<ISBN>3796505635</ISBN>
<Rating>2.0</Rating>
<PublishYear>1971</PublishYear>
<PublishMonth>31</PublishMonth>
<PublishDay>12</PublishDay>
</names>
while parsing it just handle the XML as if it is :
<names>
<Id>1481744</Id>
<Name>Lettres de René-Édouard Claparède</Name>
</names>
and other info doesnt appear