5

I'm fixing a python script using h5py. It contains code like this:

hdf = h5py.File(hdf5_filename, 'a')
... 
g = hdf.create_group('foo')
g.create_dataset('bar', ...whatever...)

Sometimes this runs on a file which already has a group named 'foo', in which case I see "ValueError: Unable to create group (Name already exists)"

One way to fix this is to replace the one simple line with create_group with four lines, like this:

if 'foo' in hdf.keys():
    g = hdf['foo']
else:
    g = hdf.create_group['foo']

g.create_dataset(...etc...)

Is there a neater way to do this, maybe in only one line? Like how with files in the standard C library, 'a' mode will either append to an existing file, or create a file if it's not already there.

Same goes for datasets - I have

create_dataset('bar', ...) 

but should check first:

if 'bar' in g.keys():
   d = g['bar']
else:
   d = g.create_dataset('bar')

My wish: to find h5py has methods named create_or_use_group() and create_or_use_dataset(). What actually exists?

DarenW
  • 16,549
  • 7
  • 63
  • 102
  • Possible duplicate http://stackoverflow.com/questions/11753418/check-if-node-exists-in-h5py – ctrl-alt-delete Dec 02 '15 at 22:24
  • 1
    No, that question just asks about testing if a node exists. I want to have it created, or use it, without writing out an 'if' statement, ideally in one line. – DarenW Dec 03 '15 at 22:19

1 Answers1

5

Yes: require_group and require_dataset:

with h5py.File("/tmp/so_hdf5/test2.h5", 'w') as f:
    a = f.create_dataset('a',data=np.random.random((10, 10)))

with h5py.File("/tmp/so_hdf5/test2.h5", 'r+') as f:
    a = f.require_dataset('a', shape=(10, 10), dtype='float64')
    d = f.require_dataset('d', shape=(20, 20), dtype='float64')
    g = f.require_group('foo')
    print(a)
    print(d)
    print(g)

Note that you do need to know the shape and dtype of the dataset, otherwise require_dataset throws a TypeError. In that case, you could do something like:

try:
    a = f.require_dataset('a', shape=(10, 10), dtype='float64')
except TypeError:
    a = f['a']

If you don't already know the shape and dtype, I don't think there's much advantage to require_dataset over using try ... except ...

Yossarian
  • 5,226
  • 1
  • 37
  • 59