Behavior of multiprocessing module on cluster

Question

There are modules which are suited for multiprocessing on clusters, listed here. But I have a script which is already using the multiprocessing module. This answer states that using this module on a cluster will only let it make processes within a node. But what is this behavior like?

Lets say I have a script called multi.py which looks something like this:

import multiprocessing as mp

output = mp.Queue()

def square(num, output):
""" example function. square num """
res = num**2
output.put(res)

processes = [mp.Process(target=square, args=(x, output)) for x in range(100000)]

# Run processes
for p in processes:
   p.start()

# Exit the completed processes
for p in processes:
    p.join()

# Get process results from the output queue
results = [output.get() for p in processes]

print(results)

And I would submit this script to a cluster (for example Sun Grid Engine):

#!/bin/bash
# this script is called run.sh
python multi.py

qsub:

qsub -q short -lnodes=1:ppn=4 run.sh

What would happen? Will python produce processes within the boundary specified in the qsub command (only on 4 CPU's)? Or will it try to use every CPU on the node?

score 1 · Accepted Answer · edited May 23 '17 at 11:43

1

Your qsub call gives you 4 processors per node, with 1 node. Thus multiprocessing is going to be limited to using a maximum of 4 processors.

BTW, if you want to do hierarchical parallel computing: across multiple clusters using sockets or ssh, using MPI and in coordination with cluster schedulers, and using multiprocessing and threading… you might want to have a look at pathos and it's sister package pyina (which interacts with MPI and the cluster scheduler).

For example, see: https://stackoverflow.com/questions/28203774/how-to-do-hierarchical-parallelism-in-ipython-parallel

Get pathos here: https://github.com/uqfoundation

edited May 23 '17 at 11:43

Community

1
1

answered Feb 20 '15 at 13:25

Mike McKerns

33,715
8
119
139

a new release is imminent (and the stable release is so old that it predates `pip`'s versioning requirements). However, it's pure python, and installs with `easy_install` if you get it from github. – Mike McKerns Feb 20 '15 at 13:27

Behavior of multiprocessing module on cluster

1 Answers1