I've got the following "example.xml" document where my main goal is to be able to retrieve the comments for each tag in the document. Note, I've been able to retrieve the comments thanks to this answer, where there are no namespace prefixes, but given this, I'm getting the below errors.
<?xml version="1.0" encoding="UTF-8"?>
<abc:root xmlns:abc="http://com/example/URL" xmlns:abcdef="http://com/another/example/URL" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<tag1>
<tag2>
<tag3>tag3<!-- comment = “this is the tag3.1 comment”--></tag3>
<tag4>tag4<!-- comment = “this is the tag4.1 comment”--></tag4>
</tag2>
</tag1>
<tag1>
<tag2>
<tag3>tag3<!-- comment = “this is the tag3.2 comment”--></tag3>
<tag4>tag4<!-- comment = “this is the tag4.2 comment”--></tag4>
</tag2>
</tag1>
</abc:root>
I've tried to go through two options, both resulting in errors.
I'm essentially iterating through each node of the document and checking for the comment associated. The code is as follows:
from lxml import etree
import os
tree = etree.parse("example.xml")
rootXML = tree.getroot()
print(rootXML.nsmap)
for Node in tree.xpath('//*'):
elements = tree.xpath(tree.getpath(Node), rootXML.nsmap)
basename = os.path.basename(tree.getpath(Node))
for tag in elements:
comment = tag.xpath('{0}/comment()'.format(tree.getpath(Node)))
print(tree.getpath(Node))
print(comment)
Executing this code however, gives me the following error:
TypeError: xpath() takes exactly 1 positional argument (2 given)
I've also tried to follow this answer and define the namespace within the xpath. In doing so, my code becomes:
from lxml import etree
import os
tree = etree.parse("example.xml")
rootXML = tree.getroot()
print(rootXML.nsmap)
for Node in tree.xpath('//*'):
elements = tree.xpath(tree.getpath(Node), namespaces={rootXML.nsmap})
basename = os.path.basename(tree.getpath(Node))
for tag in elements:
comment = tag.xpath('{0}/comment()'.format(tree.getpath(Node)))
print(tree.getpath(Node))
print(comment)
where the only change is replacing elements = tree.xpath(tree.getpath(Node), rootXML.nsmap)
with elements = tree.xpath(tree.getpath(Node), namespaces={rootXML.nsmap})
. However, this then results in the following error at the modified line.
TypeError: unhashable type: 'dict'
EDIT: modified a closing bracket as per one of the answers.