Being more explicit with processing in parallel with python in parallel

Question

I am attempting to create a python script that can drive multiple MPI simulations (F90 executables, though it doesn't matter). Each of these MPI simulations use 2 processors. Lets say I want to have three of these MPI simulations running simultaneously. If I run these 3 simulation from the command line in 3 separate terminals, without python, they each get their own 2 processors, and run as though they are the only things that exist in the world.

My current implementation does not appear to be doing this. It is clear from tracking the MPI simulations that there is competition amongst the MPI simulations. Here is my current procedure

import subprocess
import multiprocessing as mp

def execute(inputs, output):
    do_stuff_with_inputs()
    subprocess.call('mpiexec -np 2 my_executable.x', shell=True)
    results = post_process_stuff()
    output.put(results)


output = mp.Queue()
processes = []
for i in xrange(3):
    process.append(mp.Process(target=execute, args=args)))

for p in process:
    p.start()

for p in process:
    p.join()

results = [output.get() for p in process]

What I would like to do is be more explicit with the procedure, somehow 'creating' processor space in python so that the executable call has its own dedicated number of processors.

Also, the posted code doesn't define `args` and seems to just run 2 `execute` subprocesses. Also, what is your evidence that "It is clear ... that there is competition amongst the MPI simulations"? — Tom Dalton, Mar 15 '18 at 16:47
Using mpi4py, you can spawn processes using `sub_comm = MPI.COMM_SELF.Spawn('slavef90', args=[], maxprocs=1)`. See my answer https://stackoverflow.com/questions/41699585/is-it-possible-to-send-data-from-a-fortran-program-to-python-using-mpi/41708949#41708949 You can even communicate to and from the spawned processes, but that requires altering the fortran code accordingly. — francis, Mar 15 '18 at 18:07
@TomDalton I have detailed outputs from my_executable.x that determine how efficiently the simulation ran, these outputs show that using the method above is twice as slow as running 2 simulations at the same time without python in separate terminals from the command line. And yes, I did not include args, the code snippet was not intended to be able to produce anything, since one would need 'my_executable.x' as well. Also the subprocess are the whole point of the procedure, since I need the subprocesses to complete in order get the results list at the end and continue to do more stuff — Kschau, Mar 15 '18 at 18:12
How does `my_executable.x` process input and produce output? What do `do_stuff_with_inputs` and `post_process_stuff` do? Regarding the subprocesses, I was confused because you said "Lets say I want to have three of these MPI simulations running simultaneously." but your given code only seems to run 2. — Tom Dalton, Mar 15 '18 at 21:30
If you invoke several `mpiexec` in parallel, then there is a risk several jobs end up pinned on the same resources (e.g. cores) and hence do time sharing, which is awful from a performance point of view. You'd rather revamp your MPI app so it first `MPI_Comm_split()` and then run independent computations in parallel so you only need a single `mpiexec` instance. A lesser evil would be to disable resource binding. With Open MPI, you can `mpiexec -bind-to none ...` — Gilles Gouaillardet, Mar 16 '18 at 00:33
@francis replacing my `subprocess.call(mpiexec...)` with `sub_comm = MPI.COMM_SELF.Spawn('my_executable.x', args=[], maxprocs=2); sub_comm.Disconnect()` is absolutely utilizing more of my compute resources and the simulation run at full speed, but hang upon exiting. I need to track down whether this is an mpi4py issue or a multiprocessing issue... there are too many parallelizing levels going on!! — Kschau, Mar 16 '18 at 13:48
@TomDalton `do_stuff_with_inputs()` just creates a text file that `my_executable.x` uses as in input to the simulation, and `post_process_stuff()` parses the output of `my_executable.x` and records it. And yes it seems my code snippet only runs two, but one of the goals is to to able to run an arbitrary number of these simulations independently (limited only by my cores available to me) — Kschau, Mar 16 '18 at 13:51
@GillesGouaillardet while this F90 code is a research code that can't easily be modified, turning off CPU binding is allowing for full performance of my original code snippet!!! Are there risks to this method other than overwhelming a machine? — Kschau, Mar 16 '18 at 13:59
Strictly speaking, this is not full performance since MPI tasks can migrate between cores/sockets. That being said, as long as you do not start more MPI tasks than cores, the Linux scheduler should handle that quite well and your performances should be close to optimal. It is up to you not to start more MPI tasks than available cores, and then the risks (if any) should be quite limited. — Gilles Gouaillardet, Mar 16 '18 at 14:11

score 0 · Answer 1 · answered Mar 16 '18 at 17:00

0

For my purposes, the suggestion from @GillesGouaillarde was sufficient, removing CPU binding from my subprocess call

subprocess.call('mpiexec -bind-to none -np 2 my_executable.x', shell=True)

sped up performance to an acceptable level.

answered Mar 16 '18 at 17:00

Kschau

145
1
12

Being more explicit with processing in parallel with python in parallel

1 Answers1