0

I'm trying to use parallel h5py to create an independent group for each process and fill each group with some data.. what happens is that only one group gets created and filled with data. This is the program:

from mpi4py import MPI
import h5py

rank = MPI.COMM_WORLD.Get_rank()
f = h5py.File('parallel_test.hdf5', 'w', driver='mpio', comm=MPI.COMM_WORLD)

data = range(1000)

dset = f.create_dataset(str(rank), data=data)

f.close()

Any thoughts on what is going wrong here?

Thanks alot

Shazly
  • 95
  • 1
  • 1
  • 11
  • Is the `rank` different for each process? From a `mp` tutorial it looks like `rank` distinguishes between sender and receiver processes, or something like that. What's the name of the `dataset` in the file? – hpaulj Jul 05 '18 at 16:48
  • what do you mean by `group` ? if you mean a HDF5 Group, then they must be created and allocated by a single task, and then you can populate them with data in parallel. – Gilles Gouaillardet Jul 05 '18 at 23:57
  • @hpaulj Yes `rank` is an identifier for each process.. For example, `rank 0` will create HDF5 group `0` and put its data in it, and the same goes for the rest of the processes – Shazly Jul 06 '18 at 09:37
  • All operations on the file structure must be performed collectively. You must create the datasets everywhere. Then, you can fill them separately. – Pierre de Buyl Jul 11 '18 at 13:25

1 Answers1

2

Ok, so as mentioned in the comments I had to create the datasets for every process then fill them up.. The following code is writing data in parallel as many times as the size of the communicator:

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

data = [random.randint(1, 100) for x in range(4)]

f = h5py.File('parallel_test.hdf5', 'w', driver='mpio', comm=comm)

dset = []
for i in range(size):
   dset.append(f.create_dataset('test{0}'.format(i), (len(data),), dtype='i'))

dset[rank][:] = data
f.close()
Shazly
  • 95
  • 1
  • 1
  • 11