1

I have a large XML file that I want to parse, and only print one specific value if two values are met.

This is the code so far:

#!/usr/local/bin/python

import xml.etree.ElementTree as ET
tree = ET.parse('onedb-dhcp.xml')
root = tree.getroot()

# This successfully gets all items in the xml:

print 'This successfully gets all items in the xml:\n'

for p in root.iter('PROPERTY'):
    print p.attrib
print '\n----------------------------------------------------------'

This is the sample xml file:

<DATABASE NAME="test" VERSION="43-39" MD5="." SCHEMA-MD5="." INT-VERSION="43-39">
<OBJECT><PROPERTY NAME="__type" VALUE="dhcp.lease"/><PROPERTY NAME="is_invalid_mac" VALUE="false"/><PROPERTY NAME="deferred_ttl" VALUE="300"/><PROPERTY NAME="ack_state" VALUE="renew"/><PROPERTY NAME="v6_prefix_bits" VALUE="0"/><PROPERTY NAME="is_ipv4" VALUE="true"/><PROPERTY NAME="vnode_id" VALUE="79"/><PROPERTY NAME="node_id" VALUE="79"/><PROPERTY NAME="ip_address" VALUE="10.10.1.6"/><PROPERTY NAME="dhcp_range" VALUE="10.10.1.5/10.10.1.254///0/"/><PROPERTY NAME="network_view" VALUE="0"/><PROPERTY NAME="starts" VALUE="2 2017/01/17 04:58:52"/><PROPERTY NAME="ends" VALUE="6 2017/01/21 04:58:52"/><PROPERTY NAME="tstp" VALUE="1 2017/01/23 04:58:52"/><PROPERTY NAME="tsfp" VALUE="1 2017/01/23 04:58:52"/><PROPERTY NAME="atsfp" VALUE="1 2017/01/23 04:58:52"/><PROPERTY NAME="cltt" VALUE="2 2017/01/17 04:58:52"/><PROPERTY NAME="hardware" VALUE="00:1a:4b:26:fd:85"/><PROPERTY NAME="client_hostname" VALUE="&quot;printer1&quot;"/><PROPERTY NAME="binding_state" VALUE="active"/><PROPERTY NAME="next_binding_state" VALUE="expired"/><PROPERTY NAME="variable" VALUE="vendor-class-identifier=&quot;Hewlett-Packard JetDirect&quot; ddns-fwd-name=&quot;printer1.testing.net&quot; ddns-rev-name=&quot;6.1.10.10.in-addr.arpa.&quot; ddns-txt=&quot;0015dce5883b53fa75c8d90d1312f0c054&quot; lt=&quot;04294967295&quot;"/><PROPERTY NAME="ms_server_id" VALUE="."/><PROPERTY NAME="fingerprint" VALUE="HP Printer"/><PROPERTY NAME="fingerprint_class" VALUE="Printers"/></OBJECT>
<OBJECT><PROPERTY NAME="__type" VALUE="dhcp.lease"/><PROPERTY NAME="is_invalid_mac" VALUE="false"/><PROPERTY NAME="deferred_ttl" VALUE="300"/><PROPERTY NAME="ack_state" VALUE="from_peer"/><PROPERTY NAME="v6_prefix_bits" VALUE="0"/><PROPERTY NAME="is_ipv4" VALUE="true"/><PROPERTY NAME="vnode_id" VALUE="86"/><PROPERTY NAME="node_id" VALUE="86"/><PROPERTY NAME="ip_address" VALUE="10.10.1.44"/><PROPERTY NAME="dhcp_range" VALUE="10.10.1.5/101.10.1.254///0/"/><PROPERTY NAME="network_view" VALUE="0"/><PROPERTY NAME="starts" VALUE="2 2017/01/17 04:58:52"/><PROPERTY NAME="ends" VALUE="6 2017/01/21 04:58:52"/><PROPERTY NAME="tstp" VALUE="4 2016/06/23 19:17:54"/><PROPERTY NAME="tsfp" VALUE="1 2017/01/23 04:58:52"/><PROPERTY NAME="atsfp" VALUE="1 2017/01/23 04:58:52"/><PROPERTY NAME="cltt" VALUE="5 2016/06/17 19:17:54"/><PROPERTY NAME="hardware" VALUE="00:1a:4b:26:fd:85"/><PROPERTY NAME="client_hostname" VALUE="&quot;printer2&quot;"/><PROPERTY NAME="binding_state" VALUE="active"/><PROPERTY NAME="next_binding_state" VALUE="expired"/><PROPERTY NAME="variable" VALUE="lt=&quot;345600&quot; ddns-txt=&quot;0015dce5883b53fa75c8d90d1312f0c054&quot; ddns-rev-name=&quot;44.1.10.10.in-addr.arpa.&quot; ddns-fwd-name=&quot;printer2.testing.net&quot; vendor-class-identifier=&quot;Hewlett-Packard JetDirect&quot;"/><PROPERTY NAME="ms_server_id" VALUE="."/></OBJECT>
</DATABASE>

When I run the above script, this is what I get printed to screen (just a sample):

{'NAME': '__type', 'VALUE': 'dhcp.lease'}
{'NAME': 'is_invalid_mac', 'VALUE': 'false'}
{'NAME': 'deferred_ttl', 'VALUE': '300'}
{'NAME': 'ack_state', 'VALUE': 'renew'}
{'NAME': 'v6_prefix_bits', 'VALUE': '0'}
{'NAME': 'is_ipv4', 'VALUE': 'true'}
{'NAME': 'vnode_id', 'VALUE': '79'}
{'NAME': 'node_id', 'VALUE': '79'}
{'NAME': 'ip_address', 'VALUE': '10.10.1.6'}

How can I set it up to only print the ip_address value if _type = dhcp.lease?

I've tried this:

l = 'dhcp.lease'
ip = 'ip_address'

for s in root.iter('PROPERTY'):
        n = s.attrib['NAME']
        d = s.attrib['VALUE']
        if d == l:
                print s.attrib['VALUE']

That prints out this:

Searching for specific things...

dhcp.lease
dhcp.lease

I think I'm close to the finish line, but need some help getting over it.

martineau
  • 119,623
  • 25
  • 170
  • 301
DDI Guy
  • 105
  • 1
  • 1
  • 9

1 Answers1

1

You need to iterate through all objects first. If you find a property with "dhcp.lease", you print the "ip_adress" of the object.

Try this:

for obj in tree.iter('OBJECT'):

    # Build a dictionary from NAME and VALUE of each property
    properties = dict([
        (p.attrib['NAME'], p.attrib['VALUE'])
        for p in obj.iter('PROPERTY')
    ])

    # Skip this object if it's not a dhcp lease
    if properties['__type'] != 'dhcp.lease':
        continue

    print properties['ip_address']

I'm assuming here that your properties have unique names, so that I can create a dictionary to make lookups easier.

If you want to extends this later to add more checks, you could add more if statements before the printing. Something like (not valid python): if properties['ends'] < now + 7 days: continue

fafl
  • 7,222
  • 3
  • 27
  • 50
  • that got me a little closer. It finds the number of items that meet criteria, but it's printing brackets rather than the values: Getting IPs... [] [] [] [] [] – DDI Guy Jan 22 '17 at 18:17
  • If you only get [] then there are no properties with NAME == 'ip_address'. If you get something like `['1.2.3.4']` then replace `])` with `][0])` in the last line – fafl Jan 22 '17 at 18:19
  • thanks so much. I make small tweak to your sample and I am getting what I want. However the IP's are surrounded by brackets and single quotes. ['10.10.1.6']. How can I change the print statement to just print an IP on each line. – DDI Guy Jan 22 '17 at 18:30
  • I edited the code to remove the brackets in the output – fafl Jan 22 '17 at 18:36
  • that is perfect! Thanks so much. Do you mind explaining it a little? I may want to modify it in the future to skip other items. For example, if an IP's lease is not within the last 7 days. The lease start/end time is in the xml file. – DDI Guy Jan 22 '17 at 18:42
  • all of the properties have unique names. How would I create a dictionary do lookups that way? – DDI Guy Jan 22 '17 at 19:01
  • Changed the code to use a dictionary. Now it should be a lot easier to read. – fafl Jan 22 '17 at 19:16
  • Is there anyway to parse the XML file one line at a time? It appears that it's reading the entire file into memory and then determining if it's a lease or not. If I run this against the entire XML, it just doesn't complete. If I run it against a smaller set (around 10,000 lines) it works perfect. The entire XML is 6.5 million lines. One idea I had was to grep for leases, and if it exists then put all of those into one file and run the script against that. But I was hoping to accomplish it all in one script. – DDI Guy Jan 22 '17 at 21:48
  • Sounds like you want to stream the XML file: http://stackoverflow.com/questions/7693535/what-is-a-good-xml-stream-parser-for-python – fafl Jan 22 '17 at 21:55
  • Thanks for the streaming link. I'll look at that, and compare to what I have now to see what I need to modify to make it work. – DDI Guy Jan 22 '17 at 22:05