Parallelize Python code on a cluster

Question

I have a Python code (it's a minimization algorithm) that calls a function 'get_drift_time'. This function creates a bunch of particles (~100,000), drifts each one of them using a function 'drift_particles' and saves the resulting parameters for each particle into a list. I then take this list and feed it back into my main minimization code where I use it further. This iteration (calling 'get_drift_time' and creating the list from 'drift_particles') happens many times O(3).

The drifting of each of the particles is independent of any other particles. I found a way to parallelize this using the Pool class:

from multiprocessing import Pool

# previous code is here
pool = Pool(processes=20)
# start_points gives the initial position of each particle; cube_field and length define the drift environment and are the same for every particle
result_async = pool.map_async(drift_particles, itertools.izip(start_points, itertools.repeat(cube_field), itertools.repeat(length)))
result = np.asarray(result_async.get())
pool.close()
pool.join()
tet_cut = result[:,9] == 1
return tet_cut

The overall speed of my code is limited by how quickly I can get all the particles drifted. Right now while using pool.map_async this in turn seems to be limited by the number of cores on a node. I am running this code on one node with 20 cores, but I have access to many, many more nodes so I would like to take advantage of it. I tried running this code using 40 cores spread across different nodes but it made the code significantly slower.

How can I parallelize this process so that I can take advantage of using multiple nodes (each with several cores)?

Edit: I looked into using jug, but I can't figure out how to apply it to this example. I can modify my code to include the @TaskGenerator, but I'm not sure if it's possible to run jug from within the code. Ie on the jug website I found the command to run jug on LSF to be: bsub -o output.txt -J "jug[1-100]" jug execute myscript.py

However, I don't want to start 100 jobs of myscript.py. Rather from within myscript.py I want to periodically call jug to run multiple instances of map(drift_particles, itertools.izip(start_points, itertools.repeat(cube_field), itertools.repeat(length))). Is that possible?

[This question](http://stackoverflow.com/questions/5181949/using-the-multiprocessing-module-for-cluster-computing) and its answers might be relevant. — Amadan, Dec 03 '15 at 01:39
You can do this in a quite brute-force way without `Hadoop`: write a script to `ssh` from the master node you start with, and run your code at each node on different parts of your input, then `sftp` the result back to your master node. — Patrick the Cat, Dec 05 '15 at 00:15

Parallelize Python code on a cluster

0 Answers0