Why does the mpiexec command on a python file writing in parallel using h5py use more than the specified number of cores?

Question

I have the below file called demo2.py. Copied from: parallel write to different groups with h5py and http://docs.h5py.org/en/stable/mpi.html.

from mpi4py import MPI
import h5py

rank = MPI.COMM_WORLD.rank
size = MPI.COMM_WORLD.size

f = h5py.File('parallel_test.hdf5', 'w', driver='mpio', comm=MPI.COMM_WORLD)

dsets = []
for i in range(size):
    dsets.append(f.create_dataset('test{0}'.format(i), (1,), dtype='i'))

dsets[rank][:] = rank

f.close()

I run it using: mpiexec -n 2 python3 demo2.py in the command line. When I check the cores being used using htop, I can clearly see that more than 2 cores are being used. I have three questions:

Why and how is this happening?
How can I strictly restrict the number of cores being used?
How is a part of the code being executed only once and the rest being distributed? Or have I misunderstood this? Could you please explain the flow of the program along with the sharing of resources among processes?

Thanks a lot, any help is greatly appreciated!

a MPI task can have several threads, and hence (try to) use more cores than MPI tasks. You can generally bind each MPI task to a single core via the command line (and if a task starts other threads, they will end up time sharing). — Gilles Gouaillardet, Jul 28 '20 at 10:38
Thanks for the comment @GillesGouaillardet. Can you please tell me which part of my code is creating these threads? In this example, does it mean that there is no advantage of using -n 2 over -n 1? `mpiexec -n 2 python3 demo2.py --bind-to hwthread` seems to do what I would like but what does it exactly mean? — PSK, Jul 28 '20 at 12:35
if you want to restrict a MPI task to a single core, then `mpirun --bind-to core ...`. note the default behavior of recent Open MPI is to bind to core by default when running 2 MPI tasks. Open MPI spawns at least 2 helper threads per MPI task, but they should not consume much resources, except maybe at startup and finalize. so performance wise, 2 tasks (one core each) has more potential for performance than 1 task. — Gilles Gouaillardet, Jul 28 '20 at 14:51
I am using MPICH and not Open MPI. This answers two of my questions but I still don't clearly understand the flow of the program. Specifically, is this whole code running `-n` number of times? If yes, how is the same file `f` created only once and shared across all the runs of this program? Could you please write a little bit about the flow and resource sharing of this particular example including the spawning of helper threads, and your previous two comments as an answer so I can accept it? Some resources to understand this better would also be very helpful. Thanks! — PSK, Jul 29 '20 at 01:45

Why does the mpiexec command on a python file writing in parallel using h5py use more than the specified number of cores?

0 Answers0