I am downloading and unzipping many large files in parallel using threading, but as I understand it, the GIL limits how much of my CPU I can actually use.
When I learned about Linux in school, I remember that we had a lab in which we spawned a lot of processes using foo.py &
in the command line. These processes used up all of our CPU power.
Currently, I am working in Windows, and I wonder whether I can use the subprocess
module to also spawn multiple Python processes, each with its own GIL. I would split my list of download links into, say, four roughly equal lists and pass one of these sub-lists to each of four sub-processes. Then each subprocess would use threading to further speed up my downloads. I'd do the same for the unzipping, which takes even longer than the downloading.
Am I conceptualizing subprocesses correctly, and is it possible that this approach would work for my downloading and unzipping purposes?
I've searched around SO and other web resources, but I've not found much addressing such a hacky approach to multi-processing multi-threading. There was this question, which said that the main program doesn't communicate with subprocesses once the latter are spawned, but for my purposes, I would only need each subprocess to send a "finished" flag to back to the main program.
Thank you!