0

I have an XML file that contains file names and file extensions that are commonly associated with ransomware and malware.

Using Python, I want to parse the "Pattern Value" to simply print the value that is contained inside of "Pattern Value". The goal is to output to a simple text file where on each line it displays a file name or file extension.

I tried to use macros in Notepad++ but that was a failure. I'm a Python noob and I'd like to accomplish this using Python

Below is the XML file:

<?xml version="1.0" ?>
<Root >
    <Header DatabaseVersion = '2.0' ></Header>
    <QuotaTemplates ></QuotaTemplates>
    <DatascreenTemplates ></DatascreenTemplates>
    <FileGroups >
        <FileGroup Name = 'Anti-Ransomware%sFile%sGroups' Id = '{367CFFB7-DDED-4AA8-8E17-203B6B97F411}' Description = '' >
            <Members >
                <Pattern PatternValue = '!!%sRETURN%sFILES%s!!.txt' ></Pattern>
                <Pattern PatternValue = '!!!%sHOW%sTO%sDECRYPT%sFILES%s!!!.txt' ></Pattern>
                <Pattern PatternValue = '!!!%sREAD%sTHIS%s-%sIMPORTANT%s!!!.txt' ></Pattern>
                <Pattern PatternValue = '!!!!!ATENÇÃO!!!!!.html' ></Pattern>
                <Pattern PatternValue = '!!!!!SAVE%sYOUR%sFILES!!!!.txt' ></Pattern>
                <Pattern PatternValue = '!!!-WARNING-!!!.html' ></Pattern>
                <Pattern PatternValue = '!!!-WARNING-!!!.txt' ></Pattern>
                <Pattern PatternValue = '!!!GetBackData!!!.txt' ></Pattern>
                <Pattern PatternValue = '!!!README!!!*.rtf' ></Pattern>
                <Pattern PatternValue = '!!!READ_TO_UNLOCK!!!.TXT' ></Pattern>
                <Pattern PatternValue = '!!!SAVE%sYOUR%sFILES!.bmp' ></Pattern>
                <Pattern PatternValue = '!##%sDECRYPT%sFILES%s##!.txt' ></Pattern>
                <Pattern PatternValue = '!#_DECRYPT_#!.inf' ></Pattern>
                <Pattern PatternValue = '!DMALOCK3.0*' ></Pattern>
                <Pattern PatternValue = '!Decrypt-All-Files-*.txt' ></Pattern>
                <Pattern PatternValue = '!Please%sRead%sMe!.txt' ></Pattern>
                <Pattern PatternValue = '!READ.htm' ></Pattern>
                <Pattern PatternValue = '!Recovery_*.html' ></Pattern>
                <Pattern PatternValue = '!Recovery_*.txt' ></Pattern>
                <etc.../>
            </Members>
        </FileGroup>
    </FileGroups>
</Root>

Again, the goal is to output each file name/file extension in a text file on a new line. For example

test.malware
test.ransomware 
test.virus
etc
etc
etc

Thank you in advance for your assistance

Aaron
  • 10,133
  • 1
  • 24
  • 40
VOIPSec
  • 1
  • 1
  • 4
  • Your xml has a mismatched tag at the end: `` is not closed properly by `` – Aaron Apr 05 '19 at 14:14
  • That's because I trimmed the list significantly and didn't append the correct tags at the end. Let me adjust that. – VOIPSec Apr 05 '19 at 14:17
  • It is always a good idea to use an actual xml parser to deal with xml but they will not work correctly if the tags are wrong. That is all. – Aaron Apr 05 '19 at 14:18
  • Ahh, the tags are for the exclusion list. That can be ignored. – VOIPSec Apr 05 '19 at 14:19
  • @Aaron I suppose i'm not picky how this gets done - I just want to achieve my end result – VOIPSec Apr 05 '19 at 14:23

1 Answers1

0

The standard xml parser for python is typically the xml.etree.ElementTree library.

The basic usage is to first parse your xml which can be done from a file name or a string if you've already read the file in some other way (or generated the string some other way ie: read from a port or something).

import xml.etree.ElementTree as ET
tree = ET.parse('myxmlfile.xml')

Then you have a lot of options on how to find the elements of interest, but I'd suggest some of the built-in search tools like ElementTree.iterfind()

for element in tree.iterfind('Pattern'):
    print(element.attrib)

Depending on the structure this search may not be selective enough, though you could first find the section you want (members or nonmembers of a file group, etc.) then perform the search from there.

Aaron
  • 10,133
  • 1
  • 24
  • 40