There are modules which are suited for multiprocessing on clusters, listed here. But I have a script which is already using the multiprocessing
module. This answer states that using this module on a cluster will only let it make processes within a node. But what is this behavior like?
Lets say I have a script called multi.py
which looks something like this:
import multiprocessing as mp
output = mp.Queue()
def square(num, output):
""" example function. square num """
res = num**2
output.put(res)
processes = [mp.Process(target=square, args=(x, output)) for x in range(100000)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
And I would submit this script to a cluster (for example Sun Grid Engine):
#!/bin/bash
# this script is called run.sh
python multi.py
qsub:
qsub -q short -lnodes=1:ppn=4 run.sh
What would happen? Will python produce processes within the boundary specified in the qsub
command (only on 4 CPU's)? Or will it try to use every CPU on the node?