0

Consider if you will, a world where you have a default nmap XML output.

I am specifically trying to parse out the IP Address (no problem here), and OS Vendor (here is the problem).

The issue is because the xml tag has several instances, as well as attributes and I can't figure out how to use the untangle syntax to pull and attribute from a tag that also needs indices.

The xml looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="file:///usr/bin/../share/nmap/nmap.xsl" type="text/xsl"?>
<!-- Nmap 7.40 scan initiated Tue Aug 29 12:45:56 2017 as: nmap -sV -O -oX ./nmap_results.xml 1.2.3.4/24 -->
<nmaprun attributes="">
    <scaninfo attributes="" />
    <debugging attributes="" />
    <host attributes="">
        <status attributes="" />
        <address attributes="" />
        <hostnames>
            <hostname attributes="" />
        </hostnames>
        <ports>
            <extraports attributes="">
                <extrareasons attributes="" />
            </extraports>
            <port attributes="">
                <state attributes="" />
                <service attributes="" />
            </port>
            <port attributes="">
                <state attributes="" />
                <service attributes="">
                    <cpe>stuff</cpe>
                    <cpe>more stuff</cpe>
                </service>
            </port>
            ...

Lets just assume that I want to pull the attributes from the first instance of port.

In my python, I would assume it would look something like this:

#!/bin/env python
import untangle

nmap = untangle.parse('./location/to/results.xml')

alive = int(nmap.nmaprun.runstats.hosts['up'])

count = range(0,alive,1)
for tick in count:
    print(nmap.nmaprun.host[tick].ports.port[0, 'attribute'])

The problem here, is that instance of port[0, 'attribute']) because it wants and needs that 0 index, but there are also attributes that I want to pull.

Here is the python error:

/usr/bin/python2.7 /path/to/my/dot.py
Traceback (most recent call last):
  File "/path/to/my/dot.py", line 10, in <module>
    print(nmap.nmaprun.host[tick].ports.port[0, 'vendor'])
TypeError: list indices must be integers, not tuple

Process finished with exit code 1

If I try to just use the attribute name, without the index, I get this:

/usr/bin/python2.7 /path/to/my/dot.py
Traceback (most recent call last):
  File "/path/to/my/dot.py", line 10, in <module>
    print(nmap.nmaprun.host[tick].ports.port['vendor'])
TypeError: list indices must be integers, not str

Process finished with exit code 1

And if I provide just the index, I get a string with all of the attributes, but I only need to one.

What am I doing wrong or is there a way?

NickD
  • 5,937
  • 1
  • 21
  • 38
  • Can you post a complete XML file? Posting a snippet makes it difficult to test. – NickD Aug 29 '17 at 19:28
  • What's the type of `nmap.nmaprun.host[tick].ports.port[0]`? – NickD Aug 29 '17 at 19:28
  • I've uploaded a full xml here (https://pastebin.com/EgguG1Ss) – Jordan Gregory Aug 29 '17 at 19:35
  • Python says: ` None ` – Jordan Gregory Aug 29 '17 at 19:39
  • Are you restricted to use _untangle_ module for parsing the _xml_? P.S. What if you'd use smth like: `print(nmap.nmaprun.host[tick].ports.port[0]['portid']` ? – CristiFati Aug 29 '17 at 19:52
  • No restrictions, it's just always what I've used in the past because it's easy. – Jordan Gregory Aug 29 '17 at 19:57
  • As far as combining, I don't think the module likes that lol, I got a funky error from the module itself by adding the attribute like that – Jordan Gregory Aug 29 '17 at 20:01
  • Hmm, I imagine that _untangle_ converts the _xml_ tree into a structure of nested _Python_ objects based on _xml_ nodes tags tags. If that's true, I see a problem in how you use it: `nmap.nmaprun.host[tick]` there's only one `host` tagged node in the _xml_; – CristiFati Aug 29 '17 at 20:16
  • that's true for my test XML that that only has one host, but in a normal NMAP scan against a subnet, you will have multiple. I would upload my network scan, but it has too much confidential data to strip. – Jordan Gregory Aug 29 '17 at 20:30

1 Answers1

0

Instead of blindly guessing, I've downloaded the module (it's a single .py file) and started to play with it. What I've learned:

  1. The xml tree is indeed converted into a tree like structure of Python objects (untangle.Element), based on xml node tags (which will become object attributes)
  2. Element supports indexing ([Python]: object.__getitem__(self, key)) and returns an xml node attribute with the name matching the given key
  3. When an xml node has several nodes with the same tag, the corresponding converted object will be a list of Element objects
  4. Element supports iteration ([Python]: object.__iter__(self)) and it yields itself when iterated over

From bullets 3. and 4. results that it' best to always iterate over an element which can appear once or more Side note *.

Here's some code that demonstrates that:

#!/bin/env python

import untangle

FILE_NAME = "a.xml" # "./location/to/results.xml" # You should change the name back to match your location


def main():
    nmap = untangle.parse(FILE_NAME)
    up_host_count = int(nmap.nmaprun.runstats.hosts['up'])
    host_iterator = nmap.nmaprun.host
    for host in host_iterator:
        print("IP Address: {}".format(host.address["addr"]))

        vendors = set()
        osmatch_iterator = host.os.osmatch
        for osmatch in osmatch_iterator:
            osclass_iterator = osmatch.osclass
            for osclass in osclass_iterator:
                vendor = osclass["vendor"]
                if vendor is not None:
                    vendors.add(vendor)
        print("    OS Vendors: {}".format(vendors))

        port_iterator = host.ports.port
        for port in port_iterator:
            print("    Port number: {}".format(port["portid"]))


if __name__ == "__main__":
    main()

Notes:

  • Every for loop in the code is an example of iteration (that I talked above), and I got it from the provided xml sample (the complete version), looking where are there more than one node with the same tag
  • Of course there's the alternative to iteration of always checking the type of an object, but that's neither nice nor scalable
  • The port handling was not required in the question, but I placed it there since there was an example that didn't work involving the ports
  • Since the nmap scan can identify more than one OS (does happen in our case) that come from different vendors (doesn't happen in our case) which can happen especially between Ux(Unix) flavors, I added some logic for the OS vendor part to only display it once (you can manually modify the xml file, and specify a vendor other than Linux for one of the osclass nodes and see that it appears in the output)
  • Code runs with Python3 and Python2

Output:

E:\Work\Dev\StackOverflow\q45946779>python b.py
IP Address: 127.0.0.1
    OS Vendors: set([u'Linux'])
    Port number: 22
    Port number: 111
    Port number: 631
    Port number: 2222
    Port number: 8081
    Port number: 30000

Side note *: I talked about an element that can appear once or more, but this approach (I'm talking about the untangle module approach) is not very error proof if the xml is incomplete. Take the following line of code (which is no longer used, but I kept it there only to make a point):

up_host_count = int(nmap.nmaprun.runstats.hosts['up'])

If any of the nodes nmaprun, runstats, hosts is missing from the xml, the line will spit AttributeError. The same line, but with error proofing would look like:

up_host_count = int(getattr(getattr(getattr(nmap, "nmaprun", None), "runstats", None), "hosts", None)["up"] or 0)

but that's ugly, and it gets even messier when advancing the xml tree depth.

CristiFati
  • 38,250
  • 9
  • 50
  • 87
  • That seems to get me exactly what I need using the tools that I have. Thank you very much. – Jordan Gregory Aug 30 '17 at 15:24
  • That being said, you mentioned that it's not very error-proof, what would you use to parse XMLs rather than this? I've looked into etree and things like that, but I am just so very green when it comes to XML parsing in python. It seemed like more of a hassle than it was worth at the time. – Jordan Gregory Aug 30 '17 at 15:25
  • I worked previously with `xml.etree` (I'm not saying it's the best but I'm comfortable with it) and also with `xml.dom`. Here are some examples: [\[SO\]: Python get XML siblings into dictionary](https://stackoverflow.com/questions/45799991/python-get-xml-siblings-into-dictionary), [\[SO\]: Python read xml with related child elements](https://stackoverflow.com/questions/45049761/python-read-xml-with-related-child-elements), [\[SO\]: Convert XML into Lists of Tags and Values with Python](https://stackoverflow.com/questions/44622347/convert-xml-into-lists-of-tags-and-values-with-python). – CristiFati Aug 30 '17 at 17:07