20

I use the Python package h5py (version 2.5.0) to access my hdf5 files.

I want to traverse the content of a file and do something with every dataset.

Using the visit method:

import h5py

def print_it(name):
    dset = f[name]
    print(dset)
    print(type(dset))


with h5py.File('test.hdf5', 'r') as f:
    f.visit(print_it)

for a test file I obtain:

<HDF5 group "/x" (1 members)>
<class 'h5py._hl.group.Group'>
<HDF5 dataset "y": shape (100, 100, 100), type "<f8">
<class 'h5py._hl.dataset.Dataset'>

which tells me that there is a dataset and a group in the file. However there is no obvious way except for using type() to differentiate between the datasets and the groups. The h5py documentation unfortunately does not say anything about this topic. They always assume that you know beforehand what are the groups and what are the datasets, for example because they created the datasets themselves.

I would like to have something like:

f = h5py.File(..)
for key in f.keys():
    x = f[key]
    print(x.is_group(), x.is_dataset()) # does not exist

How can I differentiate between groups and datasets when reading an unknown hdf5 file in Python with h5py? How can I get a list of all datasets, of all groups, of all links?

NoDataDumpNoContribution
  • 10,591
  • 9
  • 64
  • 104

5 Answers5

17

Unfortunately, there is no builtin way in the h5py api to check this, but you can simply check the type of the item with is_dataset = isinstance(item, h5py.Dataset).

To list all the content of the file (except the file's attributes though) you can use Group.visititems with a callable which takes the name and instance of a item.

Gall
  • 1,595
  • 1
  • 14
  • 22
11

While the answers by Gall and James Smith are indicating the solution in general, the traversal through the hierachical HDF structure and filtering of all datasets still needed to be done. I did it using yield from which is available in Python 3.3+ which works quite nicely and present it here.

import h5py

def h5py_dataset_iterator(g, prefix=''):
    for key, item in g.items():
        path = '{}/{}'.format(prefix, key)
        if isinstance(item, h5py.Dataset): # test for dataset
            yield (path, item)
        elif isinstance(item, h5py.Group): # test for group (go down)
            yield from h5py_dataset_iterator(item, path)

with h5py.File('test.hdf5', 'r') as f:
    for (path, dset) in h5py_dataset_iterator(f):
        print(path, dset)
NoDataDumpNoContribution
  • 10,591
  • 9
  • 64
  • 104
3

For example, if you want to print the structure of a HDF5 file you can use the following code:

def h5printR(item, leading = ''):
    for key in item:
        if isinstance(item[key], h5py.Dataset):
            print(leading + key + ': ' + str(item[key].shape))
        else:
            print(leading + key)
            h5printR(item[key], leading + '  ')

# Print structure of a `.h5` file            
def h5print(filename):
    with h5py.File(filename, 'r') as h:
        print(filename)
        h5printR(h, '  ')

Example

>>> h5print('/path/to/file.h5')

file.h5
  test
    repeats
      cell01: (2, 300)
      cell02: (2, 300)
      cell03: (2, 300)
      cell04: (2, 300)
      cell05: (2, 300)
    response
      firing_rate_10ms: (28, 30011)
    stimulus: (300, 50, 50)
    time: (300,)
Yas
  • 4,957
  • 2
  • 41
  • 24
1

Because h5py uses python dictionaries as its method-of-choice for interaction, you need to use the "values()" function to actually access the items. So you may be able to use list filters:

datasets = [item for item in f["Data"].values() if isinstance(item, h5py.Dataset)]

Doing this recursively should be simple enough.

1

I prefer this solution. It finds the list of all objects in the hdf5 file "h5file", then sorts them based on class, similar to what has been mentioned before but not in such a succinct way:

import h5py
fh5 = h5py.File(h5file,'r')
fh5.visit(all_h5_objs.append)
all_groups   = [ obj for obj in all_h5_objs if isinstance(fh5[obj],h5py.Group) ]
all_datasets = [ obj for obj in all_h5_objs if isinstance(fh5[obj],h5py.Dataset) ]
Scott N
  • 11
  • 1