Multiprocessing vs. ThreadPoolExecutor for encryption of large data divided into chunks of specific size

Question

I have been working on a python CLI tool for file encryption recently, for which, I have decided to use PyNaCl library. I have files with a typical size of 200-500 MB. Experimentally, I found out that encrypting the data directly was slower than dividing it into chunks of around 5mb and using a thread pool to encrypt them. I don't know the nuances of concurrency or parallelism, but I want to know what the best and most performant way will be to encrypt large amounts of data.

Here's how my current implementation looks like.

from concurrent.futures import ThreadPoolExecutor
from os import urandom
from typing import Tuple

from nacl import secret
from nacl.bindings import sodium_increment
from nacl.secret import SecretBox


def encrypt_chunk(args: Tuple[bytes, SecretBox, bytes, int]):
    chunk, box, nonce, macsize = args
    try:
        outchunk = box.encrypt(chunk, nonce).ciphertext
    except Exception as e:
        err = Exception("Error encrypting chunk")
        err.__cause__ = e
        return err
    if not len(outchunk) == len(chunk) + macsize:
        return Exception("Error encrypting chunk")
    return outchunk


def encrypt(
    data: bytes,
    key: bytes,
    nonce: bytes,
    chunksize: int,
    macsize: int,
):
    box = SecretBox(key)
    args = []
    total = len(data)
    i = 0
    while i < total:
        chunk = data[i : i + chunksize]
        nonce = sodium_increment(nonce)
        args.append((chunk, box, nonce, macsize,))
        i += chunksize
    executor = ThreadPoolExecutor(max_workers=4)
    out = executor.map(encrypt_chunk, args)
    executor.shutdown(wait=True)
    return out

I have been wondering if it would be faster to use multiprocessing than ThreadPoolExecutor(). I even don't know whether the current implementation is the best way to use multithreading, so any advice regarding this is appreciated. Thanks.

threading is many threads in one process. multi-processing is many processes. You need multi-processing to use more than one CPU. Python (and you) have to work harder to setup and join processes. This is not always worth it. Your win on speed might be chunking the data and have little to do with threads. — jwal, Aug 12 '23 at 09:00
If a thread pool is faster than the single-threaded solution, that indicates that your encryption library releases the [global interpreter lock](https://stackoverflow.com/questions/1294382/what-is-the-global-interpreter-lock-gil-in-cpython) and you might not need multiprocessing. Still worth trying if your speedup was less than expected. Smaller chunks make sense since they fit into the level 3 cache and reduce main memory bandwidth — Homer512, Aug 12 '23 at 09:24
Your two loops building and waiting for futures look like you could replace them with a single [`executor.map`](https://docs.python.org/3.10/library/concurrent.futures.html#concurrent.futures.Executor.map). Also, you should limit the number of threads (`max_worker` argument) to the number of CPUs. See [`os.cpucount`](https://docs.python.org/3.10/library/os.html#os.cpu_count) — Homer512, Aug 12 '23 at 09:30
@jwal Previously, I had a simple `for` loop to loop through all the chunks and sequentially encrypt them, it was wayyyy slower than directly encrypting all the data at once. For a 300mb file, the loop took ~30s, but direct encryption took ~0.4s. Current implementation takes ~0.2s. So, I do think threads played a significant role in performance. — kush, Aug 12 '23 at 11:20
@Homer512 I actually did try using multiprocessing, but it was slightly slower than the current version. I don't know why this happened because as far as my research, I have read that multithreading is more suitable for I/O bound operations whereas multiprocessing is better for CPU bound operations. However, in this case even though encryption is clearly CPU intensive, multithreading seems to perform better. I am not sure about this since it might be due to the caveats in my implementation. — kush, Aug 12 '23 at 11:31
As I've said, it's because the important part, the actual encryption, releases the GIL (global interpreter lock) so it can work fully parallel. Simultaneously it avoids the extra overhead that comes with multiprocessing (like copying data between processes). That being said, you should still reduce the `max_workers` since otherwise the threadpool will launch too many threads. This happens because thread pools are built for IO where this stuff helps. — Homer512, Aug 12 '23 at 11:41

Multiprocessing vs. ThreadPoolExecutor for encryption of large data divided into chunks of specific size

0 Answers0