So, I assume this is a pretty typical use case, but I can't really find anything about support for this in the lxml
documentation. Basically I've got an xml file that consists of a number of distinct xml documents (reviews in particular) The structure is approximately:
<review>
<!-- A bunch of metadata -->
</review>
<!-- The issue is here -->
<review>
<!-- A bunch of metadata -->
</review>
Basically, I try to read the file in like so:
import lxml
document = lxml.etree.fromstring(open(xml_file).read())
But I get an error when I do so:
lxml.etree.XMLSyntaxError: Extra content at the end of the document
Totally reasonable error, in fact it is an xml error and should be treated as such, but my question is: how do I get lxml
to recognize that this is a list of xml documents and to parse accordingly?
list_of_reviews = lxml.magic(open(xml_file).read())
Is magic
a real lxml
function?