0

i'm trying to parse XML file to txt file (mainly to get the Text's body), but the for loop wouldn't run hence wouldn’t append results to the file, i know i'm missing something in the XML I tried to create an outer for loop in which it will findall MAEC_Bundle before finding the behaviours (I think because it’s the root ?).

this is the XML file

<MAEC_Bundle xmlns:ns1="http://xml/metadataSharing.xsd" xmlns="http://maec.mitre.org/XMLSchema/maec-core-1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maec.mitre.org/XMLSchema/maec-core-1 file:MAEC_v1.1.xsd" id="maec:thug:bnd:1" schema_version="1.100000">
    <Analyses>
        <Analysis start_datetime="2019-11-25 21:41:59.491211" id="maec:thug:ana:2" analysis_method="Dynamic">
            <Tools_Used>
                <Tool id="maec:thug:tol:1">
                    <Name>Thug</Name>
                    <Version>0.9.40</Version>
                    <Organization>The Honeynet Project</Organization>
                </Tool>
            </Tools_Used>
        </Analysis>
    </Analyses>
    <Behaviors>
        <Behavior id="maec:thug:bhv:4">
            <Description>
                <Text>[window open redirection] about:blank -&gt; http://desbloquear.celularmovel.com/</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:5">
            <Description>
                <Text>[HTTP] URL: http://desbloquear.celularmovel.com/ (Status: 200, Referer: None)</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:6">
            <Description>
                <Text>[HTTP] URL: http://desbloquear.celularmovel.com/ (Content-type: text/html, MD5: f1fb042c62910c34be16ad91cbbd71fa)</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:7">
            <Description>
                <Text>[meta redirection] http://desbloquear.celularmovel.com/ -&gt; http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:8">
            <Description>
                <Text>[HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Status: 200, Referer: http://desbloquear.celularmovel.com/)</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:9">
            <Description>
                <Text>[HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Content-type: text/html, MD5: a28fe921afb898e60cc334e06f71f46e)</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
    </Behaviors>
    <Pools/>
</MAEC_Bundle>

this is the code for parsing in python, the code below only writes operation to the file but does not enter the loop

 import xml.etree.ElementTree as ET


def logsParsing():
    tree = ET.parse(
        'analysis.xml')
    root = tree.getroot()
    with open('sample1.txt', 'w') as f:
        f.write('Operation\n')
        with open('sample1.txt', 'a') as f:
            for behavior in root.findall('Behaviors'):
                operation = behavior.find('Behavior').find('Description').find('Text').text
                line_to_write = operation + '\n'
                f.write(line_to_write)
    f.close()


logsParsing()
rahaf
  • 13
  • 3
  • Why do you open the file twice? When writing, the write pointer advances, and the next write will begin where the last one ended – CristiFati Nov 26 '19 at 18:32
  • You would want to call `f.close()` before going into author mode so the changes can be saved – Hippolippo Nov 26 '19 at 18:34
  • Your file handling is definitely weird, but the main bug is probably findall not playing well with the namespace in the root. See https://stackoverflow.com/questions/14853243/parsing-xml-with-namespace-in-python-via-elementtree. I'm tempted to close this as a duplicate. – Alex Hall Nov 26 '19 at 18:39
  • You need to take the `http://maec.mitre.org/XMLSchema/maec-core-1` namespace into account. See https://docs.python.org/3/library/xml.etree.elementtree.html#parsing-xml-with-namespaces. – mzjn Nov 26 '19 at 18:40

1 Answers1

0

Listing [Python 3.Docs]: xml.etree.ElementTree - The ElementTree XML API. You might want to insist on the following sections:

  • Parsing XML with Namespaces
  • XPath support

Here's a way of handling things.

code00.py:

#!/usr/bin/env python3

import sys
import xml.etree.ElementTree as ET


def main():
    tree = ET.parse("analysis.xml")
    root_node = tree.getroot()
    namespaces = {
        "xmlns": "http://maec.mitre.org/XMLSchema/maec-core-1",  # Namespace (default) from XML file (this is the only one we need, as tags that matter to us are not prefixed)
    }
    xpath = "./{0:s}:Behaviors/{0:s}:Behavior/{0:s}:Description/{0:s}:Text".format("xmlns")  # Compute each "Text" node full path
    print("Nodes to search: {0:s}".format(xpath))
    text_nodes = root_node.findall(xpath, namespaces)
    with open("sample1.txt", "w") as fout:  # Only open the out file once
        node_count = 0
        fout.write("Operation:\n")
        for text_node in text_nodes:
            fout.write(text_node.text + "\n")
            node_count += 1
        print("Wrote {0:d} nodes info.".format(node_count))


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    main()
    print("\nDone.")

Output:

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q059057339]> "e:\Work\Dev\VEnvs\py_064_03.07.03_test0\Scripts\python.exe" code00.py
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] 64bit on win32

Nodes to search: ./xmlns:Behaviors/xmlns:Behavior/xmlns:Description/xmlns:Text
Wrote 6 nodes info.

Done.

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q059057339]> type sample1.txt
Operation:
[window open redirection] about:blank -> http://desbloquear.celularmovel.com/
[HTTP] URL: http://desbloquear.celularmovel.com/ (Status: 200, Referer: None)
[HTTP] URL: http://desbloquear.celularmovel.com/ (Content-type: text/html, MD5: f1fb042c62910c34be16ad91cbbd71fa)
[meta redirection] http://desbloquear.celularmovel.com/ -> http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi
[HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Status: 200, Referer: http://desbloquear.celularmovel.com/)
[HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Content-type: text/html, MD5: a28fe921afb898e60cc334e06f71f46e)
CristiFati
  • 38,250
  • 9
  • 50
  • 87