If you know you will always have this data schema, you can work with the keys (as shown in the previous answer). That implies only Groups at the root level, and Datasets are the only objects under each Group. The "visitor" functions are very handy when you don't know the exact contents of the file.
There are 2 visitor functions. They are visit()
and visititems()
. Each recursively traverses the object tree, calling the visitor function for each object. The only difference is that callable function for visit
receives 1 value: name
, and for visititems
it receives 2 values: name
and node
(a h5py object). The name is just that, an object's name, NOT it's full pathname. I prefer visititems
for 2 reasons: 1) Having the node object allows you to do tests on the object type (as shown below), and 2) Determining the pathname requires you know the path or you use the object's name attribute to get it.
The example below creates a simple HDF5 file, creates a few groups and datasets, then closes the file. It then reopens in read mode and uses visititems()
to traverse the file object tree. (Note: the visitor functions can have any name and can be used with any object. It traverses recursively from that point in the file structure.)
Also, you don't need f.close()
when you use the with / as:
construct.
import h5py
import numpy as np
def visit_func(name, node) :
print ('Full object pathname is:', node.name)
if isinstance(node, h5py.Group) :
print ('Object:', name, 'is a Group\n')
elif isinstance(node, h5py.Dataset) :
print ('Object:', name, 'is a Dataset\n')
else :
print ('Object:', name, 'is an unknown type\n')
arr = np.arange(100).reshape(10,10)
with h5py.File('SO_63315196.h5', 'w') as h5w:
for cnt in range(3):
grp = h5w.create_group('group_'+str(cnt))
grp.create_dataset('data_'+str(cnt),data=arr)
with h5py.File('SO_63315196.h5', 'r') as h5r:
h5r.visititems(visit_func)