How to use parallel processing (multiprocessing) to run GPU function 100 times?

Question

I have a function on Google Colab in python that I want to run 100 times. This function uses the GPU (specifically Pytorch) on Google Colab.

What is the fastest way I can run this function 100 times? The order does not matter.

My motivation is that I am trying to run a bootstrapping experiment. So I am trying to run the same function with different random noise 100 times. This function takes a while to run each time, which is why I am trying to parallelize the computation or send it to background workers.

I have tried to use multiprocessing Pool, but this seems to be tied to CPU cores, not GPU.

I have attempted to solve the question here (How to launch 100 workers in multiprocessing?), but these responses apply to CPU.

Any suggestions would be greatly appreciated.

Thank you, in advance!

you'll have much better luck here if you create a minimal reproducible example of the code you've tried and how you've determined it's not parallelized how you want — Max Power, Jan 07 '22 at 21:32
I would brocast your data into an N x 100 array of noise and use numpy to vectorize the operation before getting to GPU stuff — Paul H, Jan 07 '22 at 21:40
Kernel functions running on a GPU are already running in parallel, so you cannot speed up the execution by running multiple kernels at the same time unless the kernel only use a part of the GPU which is rare, or if you have multiple GPUs (and the kernel only use one of them). Still, CUDA/OpenCL functions should be serialized anyway in most case (especially when it is done from a Python code that is likely calling many GPU runtime functions). — Jérôme Richard, Jan 07 '22 at 22:47

How to use parallel processing (multiprocessing) to run GPU function 100 times?

0 Answers0