I have a HPC cluster with SLURM installed. I can properly allocate nodes and cores for myself. I would like to be able to use all the allocated cores regardless of the node they are in. As i seen in this thread Using the multiprocessing module for cluster computing this cannot be achieved with multiprocessing
.
My script look like this (oversimplified version):
def func(input_data):
#lots of computing
return data
parallel_pool = multiprocessing.Pool(processes=300)
returned_data_list = []
for i in parallel_pool.imap_unordered(func, lots_of_input_data)
returned_data_list.append(i)
# Do additional computing with the returned_data
....
This script works perfectly fine, however as i mentioned multiprocessing is not a good tool for me, as even if SLURM allocated 3 nodes for me, multiprocessing can only use one. As far as i understand this is a limitation of multiprocessing.
I could use the srun
protocol of SLURM, but that ust executes the same script N times, and i need additional computing with the output of the parallel processes. I could of course store the outputs somewhere, and ream em back in, but there must be some more elegant solution.
In the mentioned thread there are suggestions like jug
, but as i was reading through it i havet found a solution for myself.
Maybe py4mpi
can be a solution for me? The tutorials for that seems very messy, and i havent found a specific solution for my problem in there neither. (run a function in parallel with mpi, and then continue the script).
I tried subprocess
calls, but the seem to work the same way as multiprocess
calls, so they only run on one node. I havent found any confirmation of this, so this is only from my trial-and-error guess.
How can i overcome this problem? Currently i could use more than 300 cores, but one node only have 32, so if i could find a solution then i would be able to run my project nearly 10 times as fast.
Thanks