Uncompressing file should use multithread or multiprocess?

Question

I'm uncompressing xz files using lzma module in python

with lzma.LZMAFile(compressed_path, "rb") as in_f, open(output_path, "wb") as out_f:
    shutil.copyfileobj(in_f, out_f)

I'm trying to speed up multi-file uncompression in python, I've tried both:

concurrent.futures.ThreadPoolExecutor(if it's IO-bound, multi-thread might help)
concurrent.futures.ProcessPoolExecutor(if it's CPU bound, multi-process might help, regarding GIL).

But they turn out to be almost the same.

My understanding for uncompression is that it's more likely IO-bound, at least the computation is much less than compression. But if it's really IO-bound, the speed is still much slower than simply copying files.

So what's really happening behind uncompress, and how should I improve the speed?

This is very dependent of the target platform (HDD vs SSD, file system, OS, processor) and your archive (compression level, many small file VS few huge file). Multiprocessing should be the best option in general since it should not be significantly slower than multithreading if the operation is only IO-bound. That being said, multiprocessing is a bit less flexible (dependent of your needs). — Jérôme Richard, May 11 '22 at 18:12
And just confirming. Your multithreading is via a function that takes `compressed_path` and `output_path` as arguments? — Frank Yellin, May 11 '22 at 18:13
Instead of guessing, I suggest you try it both ways and compare results. Alternatively, you could profile your code and see where it's spending most of its time (to determine whether it's compute or I/O bound). See [How do I profile a Python script?](https://stackoverflow.com/questions/582336/how-do-i-profile-a-python-script) — martineau, May 11 '22 at 18:40
Have you tried adding a `length` parameter of say 1024*1024 bytes to your `lzma.LZMAFile()`? — Mark Setchell, May 11 '22 at 19:48
@FrankYellin yes that's what I'm trying with multithread, and the same as multi-process as well — Ziqi Liu, May 12 '22 at 17:44

Uncompressing file should use multithread or multiprocess?

0 Answers0