0

I'm trying to print XPaths of all elements in XML tree, but I get strange output when using lxml. Instead of xpath which contains name of each node within path, I get strange "*"-kind of output. Do you know what might be the issue here? Here the code, as well as XML I am trying to analyze.

from lxml import etree

xml = """
<filter xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
  <bundles xmlns="http://cisco.com/ns/yang/Cisco-IOS-XR-bundlemgr-oper">
    <bundles>
      <bundle>
        <data>
            <bundle-status/>
            <lacp-status/>
            <minimum-active-links/>
            <ipv4bfd-status/>
            <active-member-count/>
            <active-member-configured/>
        </data>
        <members>
            <member>
                <member-interface/>
                <interface-name/>
                <member-mux-data>
                    <member-state/>
                </member-mux-data>
            </member>
        </members>
        <bundle-interface>{{bundle_name}}</bundle-interface>
      </bundle>
    </bundles>
  </bundles>
  <bfd xmlns="http://cisco.com/ns/yang/Cisco-IOS-XR-ip-bfd-oper">
    <session-briefs>
        <session-brief>
            <state/>
            <interface-name>{{bundle_name}}</interface-name>
        </session-brief>
    </session-briefs>
  </bfd>
</filter>
"""


root = etree.XML(xml)
tree = etree.ElementTree(root)
for element in root.iter():
    print(tree.getpath(element))

The output looks like this (there should be node names instead of "*"):

/*
/*/*[1]
/*/*[1]/*
/*/*[1]/*/*
/*/*[1]/*/*/*[1]
/*/*[1]/*/*/*[1]/*[1]
/*/*[1]/*/*/*[1]/*[2]
/*/*[1]/*/*/*[1]/*[3]
/*/*[1]/*/*/*[1]/*[4]
/*/*[1]/*/*/*[1]/*[5]
/*/*[1]/*/*/*[1]/*[6]
/*/*[1]/*/*/*[2]
/*/*[1]/*/*/*[2]/*
/*/*[1]/*/*/*[2]/*/*[1]
/*/*[1]/*/*/*[2]/*/*[2]
/*/*[1]/*/*/*[2]/*/*[3]
/*/*[1]/*/*/*[2]/*/*[3]/*
/*/*[1]/*/*/*[3]
/*/*[2]
/*/*[2]/*
/*/*[2]/*/*
/*/*[2]/*/*/*[1]
/*/*[2]/*/*/*[2]

Thanks a lot! Dragan

  • BTW. I have just found out that when I remove xmlns attributes, it works fine...any ideas if it can work with xmlns too? – Dragan Markovic Feb 13 '20 at 19:09
  • Does this answer your question? [How to find XML Elements via XPath in Python in a namespace-agnostic way?](https://stackoverflow.com/questions/5572247/how-to-find-xml-elements-via-xpath-in-python-in-a-namespace-agnostic-way) – stovfl Feb 13 '20 at 19:34
  • Those XPath expressions are correct. They might be not what you want, so. The problem arises with the use of namespaces because the only correct and self contained XPath expression for the document element would be `/*[local-name()='filter'][namespace-uri()='urn:ietf:params:xml:ns:netconf:base:1.0']` – Alejandro Feb 13 '20 at 20:35
  • In XPath 3.0 there is the posibility to use URI Literals as in `/Q{urn:ietf:params:xml:ns:netconf:base:1.0}filter` which is closer but not the same as [`lxml.etree._ElementTree.getelementpath(element)`](https://lxml.de/api/lxml.etree._ElementTree-class.html#getelementpath) that answer `"{urn:ietf:params:xml:ns:netconf:base:1.0}bundles"` – Alejandro Feb 13 '20 at 20:57
  • From the lxml docs: _For namespaced elements, the expression uses prefixes from the document, which therefore need to be provided in order to make any use of the expression in XPath._ You could do this by using `cleanup_namespaces()` and setting `top_nsmap` to a dict of prefixes/uris. This would result in those prefixes being used in the output paths. If that's not acceptable, what output are you wanting? Just local-name's? You could do something like that with XSLT in lxml. – Daniel Haley Feb 13 '20 at 21:30

1 Answers1

1

I found that besides getpath, etree contains also a "sibling" method called getelementpath, giving proper result also for namespaced elements.

So change your code to:

for element in root.iter():
    print(tree.getelementpath(element))

For your source sample, with namespaces shortened for readability, the initial part of the result is:

.
{http://cisco.com/ns}bundles
{http://cisco.com/ns}bundles/{http://cisco.com/ns}bundles
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41