Unable to retrieve comment from XML due to namespace prefix issue Python

Question

I've got the following "example.xml" document where my main goal is to be able to retrieve the comments for each tag in the document. Note, I've been able to retrieve the comments thanks to this answer, where there are no namespace prefixes, but given this, I'm getting the below errors.

<?xml version="1.0" encoding="UTF-8"?>
<abc:root xmlns:abc="http://com/example/URL" xmlns:abcdef="http://com/another/example/URL" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <tag1>
    <tag2>
        <tag3>tag3<!-- comment = “this is the tag3.1 comment”--></tag3>
        <tag4>tag4<!-- comment = “this is the tag4.1 comment”--></tag4>
    </tag2>
  </tag1>
  <tag1>
    <tag2>
        <tag3>tag3<!-- comment = “this is the tag3.2 comment”--></tag3>
        <tag4>tag4<!-- comment = “this is the tag4.2 comment”--></tag4>
    </tag2>
  </tag1>
</abc:root>

I've tried to go through two options, both resulting in errors.

I'm essentially iterating through each node of the document and checking for the comment associated. The code is as follows:

from lxml import etree
import os

tree = etree.parse("example.xml")
rootXML = tree.getroot()

print(rootXML.nsmap)

for Node in tree.xpath('//*'):
    elements = tree.xpath(tree.getpath(Node), rootXML.nsmap)
    basename = os.path.basename(tree.getpath(Node))
    for tag in elements:
        comment = tag.xpath('{0}/comment()'.format(tree.getpath(Node)))
        print(tree.getpath(Node))
        print(comment)

Executing this code however, gives me the following error:

TypeError: xpath() takes exactly 1 positional argument (2 given)

I've also tried to follow this answer and define the namespace within the xpath. In doing so, my code becomes:

from lxml import etree
import os

tree = etree.parse("example.xml")
rootXML = tree.getroot()

print(rootXML.nsmap)

for Node in tree.xpath('//*'):
    elements = tree.xpath(tree.getpath(Node), namespaces={rootXML.nsmap})
    basename = os.path.basename(tree.getpath(Node))
    for tag in elements:
        comment = tag.xpath('{0}/comment()'.format(tree.getpath(Node)))
        print(tree.getpath(Node))
        print(comment)

where the only change is replacing elements = tree.xpath(tree.getpath(Node), rootXML.nsmap) with elements = tree.xpath(tree.getpath(Node), namespaces={rootXML.nsmap}). However, this then results in the following error at the modified line.

TypeError: unhashable type: 'dict'

EDIT: modified a closing bracket as per one of the answers.

Acorn · Accepted Answer · 2019-11-25T12:46:30.567

You are missing a closing bracket at the end of this line:

comment = tag.xpath('{0}/comment()'.format(tree.getpath(Node))

Update

Here's a working example:

from lxml import etree
import os

xml = """<?xml version="1.0" encoding="UTF-8"?>
<abc:root xmlns:abc="http://com/example/URL" xmlns:abcdef="http://com/another/example/URL" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <tag1>
    <tag2>
        <tag3>tag3<!-- comment = “this is the tag3 comment”--></tag3>
        <tag4>tag4<!-- comment = “this is the tag4 comment”--></tag4>
    </tag2>
  </tag1>
  <tag1>
    <tag2>
        <tag3>tag3<!-- comment = “this is the tag3 comment”--></tag3>
        <tag4>tag4<!-- comment = “this is the tag4 comment”--></tag4>
    </tag2>
  </tag1>
</abc:root>""".encode('utf-8')

rootElement = etree.fromstring(xml)
rootTree = rootElement.getroottree()

print(rootElement.nsmap)

for Node in rootTree.xpath('//*'):
    elements = rootTree.xpath(rootTree.getpath(Node), namespaces=rootElement.nsmap)
    basename = os.path.basename(rootTree.getpath(Node))
    for tag in elements:
        comment = tag.xpath('{0}/comment()'.format(rootTree.getpath(Node)), namespaces=rootElement.nsmap)
        print(rootTree.getpath(Node))
        print(comment)

The main issue was trying to pass the namespaces to getPath as a positional argument, when they need to be given using the namespaces keyword argument. The other issue was trying to call methods on an _Element when they can only be called on _ElementTrees and vice versa.

Also in your second example you try and do this namespaces={rootXML.nsmap}. rootXML.nsmap is already a dictionary, you don't need any curly braces. Also, that syntax would not create a dictionary, it would create a Set, hence it complaining that the thing you're trying to put in it is not hashable.

Thank you. You're right in that there was a missing closing bracket, but this still results in the same `TypeError: xpath() takes exactly 1 positional argument (2 given)` error at `elements = tree.xpath(tree.getpath(Node), rootXML.nsmap)` (for that part of the code) — Adam, Nov 25 '19 at 12:28
And also `TypeError: unhashable type: 'dict'` as described in my question — Adam, Nov 25 '19 at 12:34
Thank you so much! I've tried your solution on the python bash and it works! However, when I try to parse the etree from file instead of from a string, I get an error `AttributeError: 'lxml.etree._ElementTree' object has no attribute 'getroottree'`. Instead of parsing from string, I am doing `rootElement = etree.parse("example.xml")`. Any idea why I'm getting an error? — Adam, Nov 25 '19 at 13:34
So it seems that when you parse from a file, lxml gives you an `ElementTree`. So you'll need to do what you did in your code `rootTree = etree.parse("example.xml"); rootElement = rootTree.getroot()` — Acorn, Nov 25 '19 at 13:38

Unable to retrieve comment from XML due to namespace prefix issue Python

1 Answers1

Update