I am working on a xml parser. The goal is to parse a number of different xml files where prefixes and tags remain consistent but namespaces change.
I am hence trying either:
- to parse the xml just by
<prefix:tags>
without resolving (replacing) the prefix with the namespace. The prefixes remain unchanged from document to document. - to load automatically the namespaces so that the identifier (
<prefix:tag>
) could be replaced with the proper namespace. - just parse the xml by tag
I have tried with xml.etree.ElementTree
.
I also had a look at lxml
I did not find any configuration option of the XMLParser in lxml that could help me out although here I could read an answer where the author suggests that lxml
should be able to collect namespaces for me automatically.
Interestingly, parsed_file = etree.XML(file)
fails with the error:
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
One example of the files I would like to parse is here