parallel write to different groups with h5py

Question

I'm trying to use parallel h5py to create an independent group for each process and fill each group with some data.. what happens is that only one group gets created and filled with data. This is the program:

from mpi4py import MPI
import h5py

rank = MPI.COMM_WORLD.Get_rank()
f = h5py.File('parallel_test.hdf5', 'w', driver='mpio', comm=MPI.COMM_WORLD)

data = range(1000)

dset = f.create_dataset(str(rank), data=data)

f.close()

Any thoughts on what is going wrong here?

Thanks alot

Is the `rank` different for each process? From a `mp` tutorial it looks like `rank` distinguishes between sender and receiver processes, or something like that. What's the name of the `dataset` in the file? — hpaulj, Jul 05 '18 at 16:48
what do you mean by `group` ? if you mean a HDF5 Group, then they must be created and allocated by a single task, and then you can populate them with data in parallel. — Gilles Gouaillardet, Jul 05 '18 at 23:57
@hpaulj Yes `rank` is an identifier for each process.. For example, `rank 0` will create HDF5 group `0` and put its data in it, and the same goes for the rest of the processes — Shazly, Jul 06 '18 at 09:37
All operations on the file structure must be performed collectively. You must create the datasets everywhere. Then, you can fill them separately. — Pierre de Buyl, Jul 11 '18 at 13:25

score 2 · Accepted Answer · answered Aug 06 '18 at 13:31

Ok, so as mentioned in the comments I had to create the datasets for every process then fill them up.. The following code is writing data in parallel as many times as the size of the communicator:

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

data = [random.randint(1, 100) for x in range(4)]

f = h5py.File('parallel_test.hdf5', 'w', driver='mpio', comm=comm)

dset = []
for i in range(size):
   dset.append(f.create_dataset('test{0}'.format(i), (len(data),), dtype='i'))

dset[rank][:] = data
f.close()

parallel write to different groups with h5py

1 Answers1

Linked