1

I wrote a ZIP cracker the other day; based on TJ O'Connor's book: Violent Python - A Cookbook for Hackers, Forensic Analysts, Penetration Testers and Security Engineers.

Now the author used threading but I was told over at Reddit that using multiprocessing would be better for brute-forcing. Is that true? If yes why, and how can I implement multiprocessing for this instance?

Is it also possible to have the threading or the multiprocessing bound to GPU instead of the CPU? As that would be more efficient and effective when brute-forcing, considering it would not choke the CPU and use the GPU's potential to do the work which would improve time to crack and tries per minute?

My code is as follows (Where I used threading since the author used it aswell)

import argparse
from threading import Thread
import zipfile

parser = argparse.ArgumentParser(description="Unzips a password protected .zip by performing a brute-force attack using either a word list, password list or a dictionary.", usage="BruteZIP.py -z zip.zip -f file.txt")
parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.")  # Creates -z arg
parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of the word list/password list/dictionary.")  # Creates -f arg
args = parser.parse_args()


def extract_zip(zip_file, password):
    try:
        zip_file.extractall(pwd=password)
        print("[+] Password for the .zip: {0}".format(password.decode("utf-8")) + "\n")
    except:
        pass  # If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt.


def main(zip, file):
    if (zip == None) | (file == None):
        print(parser.usage)  # If the args are not used, it displays how to use them to the user.
        exit(0)
    zip_file = zipfile.ZipFile(zip)
    txt_file = open(file, "rb")  # Opens the word list/password list/dictionary in "read binary" mode.
    for line in txt_file:
        password = line.strip()
        t = Thread(target=extract_zip, args=(zip_file, password))
        t.start()


if __name__ == '__main__':
    main(args.zip, args.file)  # BruteZIP.py -z zip.zip -f file.txt.

In conclusion, is threading better or is multiprocessing better for brute-forcing? And is it possible to bound either one to the GPU instead of the CPU?

Arszilla
  • 159
  • 1
  • 2
  • 12
  • What does task manager say? Does this code cause 100% CPU usage? – Thomas Weller Feb 03 '19 at 09:05
  • Possible duplicate of [Multiprocessing vs Threading Python](https://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python) + [GPU accelerated programming](https://developer.nvidia.com/how-to-cuda-python) – Torxed Feb 03 '19 at 09:06
  • @Torxed I did read that post but as this is more of a 'specialized' task I thought I'd ask it. I wasn't sure if brute-forcing required threading or multiprocessing – Arszilla Feb 03 '19 at 09:09
  • In Python, some (many) operations will lock GIL and because of that effectively use only one CPU core. Others will not. Whether `extract_zip` does or does not entirely impossible to say, because you did not show `extract_zip`. If it does, making multiple threads of it will not make this run faster. – zvone Feb 03 '19 at 09:09
  • @ThomasWeller When I run a large dictionary attack at aa zip I see my memory spike, from 20% or so to 80% (16 GB RAM). The CPU stays < 50%. – Arszilla Feb 03 '19 at 09:09
  • @zvone I updated the code. Aside that what is GIL? Also I wasnt aware multithreading was possible in Python. Then it raises the question: multithreading or multiprocessing (for brute-forcing) – Arszilla Feb 03 '19 at 09:11
  • @Arszilla, if you read that post you'd know that threading is good if you only have one CPU core to use (Intel Pentium). In all other purposes, multiprocessing will give you more execution for your bucks. – Torxed Feb 03 '19 at 09:14
  • The memory usage depends on the size of the ZIP file and the size of the password file and the stacks needed for the threads. Before you do that kind of stuff, you should know how the operating systems handles memory, threads and processes. Read "Windows Internals" by Mark Russinovich. You can skip some chapters. – Thomas Weller Feb 03 '19 at 09:14
  • I see. So a modern CPU like AMD Ryzen is bettter off running multiprocessing instead of threading, right? @Torxed – Arszilla Feb 03 '19 at 09:15
  • @Arszilla Coll. Now we see that most of the work is in ZipFile.extractall. So I googled the naive question _"does zipfile extractall lock gil?"_. It found [this page](https://www.peterbe.com/plog/fastest-way-to-unzip-a-zip-file-in-python), where some guy wrote _"The module zipfile is completely written in Python, which comes with a relatively big overhead at Python level, which in turn means the GIL is locked relatively long."_. So, threading is not very useful here. The page also gives an example using multithreading, which is much faster... – zvone Feb 03 '19 at 09:16
  • I see. Thanks a lot @zvone. That was quite helpful. Thanks to the other fellas in this thread! – Arszilla Feb 03 '19 at 09:18

1 Answers1

3

I don't how how you can bound task to GPU instead of CPU. But for other query threading vs multiprocessing you 100% would want to use multiprocessing.

Brute forcing is CPU bound task and since python has something called Global Interpreter Lock which allows only a single CPU bound thread to run one time, you application simply would not be able to make use of multiple threads that you might be spawning.

However that is not the case with Multiprocessing since it launches multiple instances of python interpreter all at once, you can effectively break down a big task into bunch of smaller ones and have each instance of python interpreter run that and then later on combine those results.

You can try and test this out yourself by running some CPU benchmarking tasks and you will see threading won't make a difference at all compared to sequential execution and in multi core systems threading might even worsen the performance.

However with Multiprocessing you will see the difference clearly.

I am intentionally not giving any reference links to what GIL is because of the fact that there are hundreds of articles on this topic and you would most likely to go through multiple of them to understand how it works and what are its benefits and repercussions.

Though you can check out Pycon talks by David Beazly and Larry Hastings on this topic, who gave a very good insight on this topic.

Dan D.
  • 73,243
  • 15
  • 104
  • 123
Rohit
  • 3,659
  • 3
  • 35
  • 57
  • 1
    @Arszilla: While the accepted answer is ok regarding CPU usage, be aware that it may involve a lot of memory. Instead of reading the zip file only once, it is now read millions of time. You PC may be very busy just reading files from disk. Next, this might use all your RAM and start swapping, i.e. it will even write to disk. In the end that might be much slower than threading. Performance optimization is a beast. It's a good idea to limit the number of processes as soon as CPU% is at 100. – Thomas Weller Feb 03 '19 at 10:53
  • Hmm so what would be an optimal way? Because a dictionary or password list might have anywhere from a few thousand lines to nearly a million lines (German Dictionary has nearly 1 million words). That means it will read the .zip over a million times, correct? That means its going to eventually write to disk and do unwanted stuff (Not sure what will writing to disk do) – Arszilla Feb 03 '19 at 10:58
  • 1
    My answer is just focused on what should be used in case of CPU bound task like brute forcing, you would never want to do things like reading a file in every process or thread you spawn. Typically you should read your file just once and then operate on it in whatever way your logic demands. – Rohit Feb 03 '19 at 11:01
  • Further since every process has its own separate memory, solution involving multiprocessing always take up more memory in comparison to multithreading. But there are memory optimization techniques, try thinking on generators if memory is also a concern. – Rohit Feb 03 '19 at 11:05
  • Issue is I am fairly new to the concept of threading/multithreading/multiprocessing. That would also mean I have no idea how to do memory optimization. I am just trying to improve my .zip cracker. And from your response I get that I should use multiprocessing and make the .zip 'open' once and go on from there? But how? Is there a good/simple guide to multiprocessing? – Arszilla Feb 03 '19 at 11:45
  • official python documentation would be your best guide, https://docs.python.org/2/library/multiprocessing.html. I am also trying to replicate your code and test with some cases and once done, I will edit my answer. – Rohit Feb 03 '19 at 14:58