0

I have been trying to parse some XML for a couple of hours now with no luck. Checked similar threads and reviewed the ElementTree docs and still quite lost.

Basically, I am receiving some XML output from a router that is stored in a string, that I in turn must parse for some specific information.

Here is a sample of the xml I am working on:

xml = """<rpc-reply xmlns:junos="http://xml.juniper.net/junos/14.1D0/junos">
        <route-information xmlns="http://xml.juniper.net/junos/14.1D0/junos-routing">
            <!-- keepalive -->
            <route-table>
                <table-name>inet.0</table-name>
                <destination-count>52</destination-count>
                <total-route-count>52</total-route-count>
                <active-route-count>52</active-route-count>
                <holddown-route-count>0</holddown-route-count>
                <hidden-route-count>0</hidden-route-count>
                <rt junos:style="brief">
                    <rt-destination>5.5.5.5/32</rt-destination>
                    <rt-entry>
                        <active-tag>*</active-tag>
                        <current-active/>
                        <last-active/>
                        <protocol-name>Direct</protocol-name>
                        <preference>0</preference>
                        <age junos:seconds="428929">4d 23:08:49</age>
                        <nh>
                            <selected-next-hop/>
                            <via>lo0.0</via>
                        </nh>
                    </rt-entry>
                </rt>
            </route-table>
        </route-information>
        <cli>
            <banner></banner>
        </cli>
</rpc-reply>"""

For example, the node I would like to get-to/print contents is the rt-destination.

I have tried:

root = ET.fromstring(xml)

values = root.find('rt')
for element in values:
    print element.text

This,

value= root.find('rt-destination')

print value

And this to set root (pointer?) at the specific node,

x = root.getiterator(tag = "destination-count")

Any help regarding how to traverse to this specific node or how to get to the desired outcome would be immensely appreciated.

Massa
  • 41
  • 4

2 Answers2

1

The reason the code is not working is because of the namespace. If the namespace is always the same, you can code it as a prefix to the tag you're trying to find:

import xml.etree.ElementTree as ET

xml = """
<rpc-reply xmlns:junos="http://xml.juniper.net/junos/14.1D0/junos">
    <route-information xmlns="http://xml.juniper.net/junos/14.1D0/junos-routing">
        <!-- keepalive -->
        <route-table>
            <table-name>inet.0</table-name>
            <destination-count>52</destination-count>
            <total-route-count>52</total-route-count>
            <active-route-count>52</active-route-count>
            <holddown-route-count>0</holddown-route-count>
            <hidden-route-count>0</hidden-route-count>
            <rt junos:style="brief">
                <rt-destination>5.5.5.5/32</rt-destination>
                <rt-entry>
                    <active-tag>*</active-tag>
                    <current-active/>
                    <last-active/>
                    <protocol-name>Direct</protocol-name>
                    <preference>0</preference>
                    <age junos:seconds="428929">4d 23:08:49</age>
                    <nh>
                        <selected-next-hop/>
                        <via>lo0.0</via>
                    </nh>
                </rt-entry>
            </rt>
        </route-table>
    </route-information>
    <cli>
        <banner></banner>
    </cli>
</rpc-reply>
"""

XML_NAMESPACE = '{http://xml.juniper.net/junos/14.1D0/junos-routing}'
root = ET.fromstring(xml)
rt_nodes = root.iter(tag='{}rt-destination'.format(XML_NAMESPACE))
print rt_nodes.next().text  # 5.5.5.5/32

If you need something more flexible, you can check out the answers here.

Community
  • 1
  • 1
Karin
  • 8,404
  • 25
  • 34
1

You are missing the namespace for the route-information tag. In your XML you have 2 namespaces, unfortunately, the one you need is not labelled.

<rpc-reply xmlns:junos="http://xml.juniper.net/junos/14.1D0/junos">
    <route-information xmlns="http://xml.juniper.net/junos/14.1D0/junos-routing">

rpc-reply falls under the namespace junos, however, the next layer and everything under it falls under the unnamed (null) namespace xmlns="http://xml.juniper.net/junos/14.1D0/junos-routing".

using root.nsmap gives the following namespace dictionary for the root layer: {'junos': 'http://xml.juniper.net/junos/14.1D0/junos'}. So to access rt elements within this namespace you would use:

root.find('junos:rt', namespaces=root.nsmap)

However, in the next layer lxml.etree is aware of the namespace "http://xml.juniper.net/junos/14.1D0/junos-routing", but because it has no label, it extracts it to the namespace map with None as the dictionary key.

>>> nsmap = root.getchildren()[0].nsmap
>>> nsmap
{'junos': 'http://xml.juniper.net/junos/14.1D0/junos',
 None: 'http://xml.juniper.net/junos/14.1D0/junos-routing'}

Well, that is a problem because we can't reference the namespace using None. One option is to just create a new namespace reference in the dictionary for 'http://xml.juniper.net/junos/14.1D0/junos-routing'.

nsmap['my_ns'] = nsmap.pop(None)

We need to use .pop here because lxml does not allow the use of a namespace with None as the key. Now you can search for the rt-destination tag using xpath and return just the text within the tag.

root.xpath('.//my_ns:rt-destination/text()', namespaces=nsmap)
James
  • 32,991
  • 4
  • 47
  • 70