5

I have results from a model simulation stored in a hdf5 file (.hdf).

I know how to open the file and peruse the data using h5py module.

The problem is, there are so many nested keys and datasets that it's a serious pain to actually find all of them and determine which actually have data in them.

This is what I am currently dealing with:

import h5py
f = h5py.File('results.hdf') #to read the file

k1 = f.keys() #shows the keys in the first level

k1
<KeysViewHDF5 ['Event Conditions', 'Geometry', 'Plan Data', 'Results']>

Now, to see all the data that is stored, I can do something like:

for k1 in f:
    for k2 in f[k1].keys():
        for k3 in f[k1][k2].keys():
            print(f[k1][k2][k3])  

<HDF5 group "/Event Conditions/Unsteady/Boundary Conditions" (2 members)>
<HDF5 group "/Event Conditions/Unsteady/Initial Conditions" (0 members)>
<HDF5 dataset "Attributes": shape (350,), type "|V45">
<HDF5 dataset "Polyline Info": shape (350, 4), type "<i4">
<HDF5 dataset "Polyline Parts": shape (350, 2), type "<i4">
<HDF5 dataset "Polyline Points": shape (3598, 2), type "<f8">
<HDF5 dataset "Attributes": shape (3,), type "|V37">
<HDF5 dataset "Polygon Info": shape (3, 4), type "<i4">
<HDF5 dataset "Polygon Parts": shape (3, 2), type "<i4">
<HDF5 dataset "Polygon Points": shape (344, 2), type "<f8">
<HDF5 dataset "Attributes": shape (1,), type "|V64">
<HDF5 dataset "Cell Info": shape (1, 2), type "<i4">
<HDF5 dataset "Cell Points": shape (586635, 2), type "<f8">
<HDF5 group "/Geometry/2D Flow Areas/Delta" (0 members)>
<HDF5 group "/Geometry/2D Flow Areas/Perimeter 1" (25 members)>
<HDF5 dataset "Polygon Info": shape (1, 4), type "<i4">
<HDF5 dataset "Polygon Parts": shape (1, 2), type "<i4">
<HDF5 dataset "Polygon Points": shape (610, 2), type "<f8">
<HDF5 dataset "Attributes": shape (1,), type "|V60">
<HDF5 dataset "External Faces": shape (177,), type "|V24">
<HDF5 dataset "Polyline Info": shape (1, 4), type "<i4">
<HDF5 dataset "Polyline Parts": shape (1, 2), type "<i4">
<HDF5 dataset "Polyline Points": shape (5, 2), type "<f8">
<HDF5 dataset "TIN Info": shape (347, 4), type "<i4">
<HDF5 dataset "TIN Points": shape (13591, 4), type "<f8">
<HDF5 dataset "TIN Triangles": shape (20008, 3), type "<i4">
<HDF5 dataset "XSIDs": shape (347, 2), type "<i4">
<HDF5 dataset "Attributes": shape (348,), type "|V676">
<HDF5 group "/Geometry/Cross Sections/Flow Distribution" (5 members)>
<HDF5 dataset "Manning's n Info": shape (348, 2), type "<i4">
<HDF5 dataset "Manning's n Values": shape (1044, 2), type "<f4">
<HDF5 dataset "Polyline Info": shape (348, 4), type "<i4">
<HDF5 dataset "Polyline Parts": shape (348, 2), type "<i4">
<HDF5 dataset "Polyline Points": shape (696, 2), type "<f8">
<HDF5 dataset "Station Elevation Info": shape (348, 2), type "<i4">
<HDF5 dataset "Station Elevation Values": shape (151973, 2), type "<f4">
<HDF5 dataset "Attributes": shape (41,), type "|V32">
<HDF5 dataset "Calibration Table": shape (2,), type "|V200">
<HDF5 dataset "Polygon Info": shape (41, 4), type "<i4">
<HDF5 dataset "Polygon Parts": shape (41, 2), type "<i4">
<HDF5 dataset "Polygon Points": shape (45442, 2), type "<f8">
<HDF5 dataset "Polyline Info": shape (2, 4), type "<i4">
<HDF5 dataset "Polyline Parts": shape (2, 2), type "<i4">
<HDF5 dataset "Polyline Points": shape (1768, 2), type "<f8">
<HDF5 dataset "Attributes": shape (1,), type "|V96">
<HDF5 dataset "Polyline Info": shape (1, 4), type "<i4">
<HDF5 dataset "Polyline Parts": shape (1, 2), type "<i4">
<HDF5 dataset "Polyline Points": shape (2042, 2), type "<f8">
<HDF5 dataset "Polyline Info": shape (2, 4), type "<i4">
<HDF5 dataset "Polyline Parts": shape (2, 2), type "<i4">
<HDF5 dataset "Polyline Points": shape (1152, 2), type "<f8">
<HDF5 dataset "Attributes": shape (1,), type "|V253">
<HDF5 dataset "Centerline Info": shape (1, 4), type "<i4">
<HDF5 dataset "Centerline Parts": shape (1, 2), type "<i4">
<HDF5 dataset "Centerline Points": shape (48, 2), type "<f8">
<HDF5 dataset "Profiles": shape (500,), type "|V28">
<HDF5 dataset "Compute Messages (rtf)": shape (1,), type "|S293107">
<HDF5 dataset "Compute Messages (text)": shape (1,), type "|S215682">
<HDF5 dataset "Compute Processes": shape (6,), type "|V332">
<HDF5 group "/Results/Unsteady/Geometry Info" (3 members)>
<HDF5 group "/Results/Unsteady/Output" (1 members)>
<HDF5 group "/Results/Unsteady/Summary" (0 members)>

But if I keep doing this, first it starts to get ridiculous and there's clearly a cleaner way, and second it starts to crash because some keys only go down a certain number of levels.

I want to know all possible keys/paths to data in the hdf file, and if they contain data (some do not).

Possibly some kind of loop with try/except in it to handle the end of a path?

Please help anyone if you know how!

Thanks.

Derek Eden
  • 4,403
  • 3
  • 18
  • 31
  • http://docs.h5py.org/en/stable/high/group.html#reference - check the `visit` method. – hpaulj Aug 29 '19 at 19:04
  • Your iteration has to be smart enough to avoid those crashes; that's basic Python programming. Either you test whether the next level has `keys` or you wrap the step in a `try/except` clause. – hpaulj Aug 29 '19 at 19:20
  • From the operating system shell, you might be able call a function like `h5dump`. It has a lot of options. – hpaulj Aug 29 '19 at 21:31

1 Answers1

3

From here and the docs link is this http://docs.h5py.org/en/latest/high/group.html#Group.visit,

def print_attrs(name, obj):
    print(name)
    for key, val in obj.attrs.items():
        print("    %s: %s" % (key, val))

f = h5py.File('foo.hdf5', 'r')
f.visititems(print_attrs)

It’s using the delegate pattern. You need to pass a callable and h5py will call it with names and object values. In your callable you can inspect and decide what to do.

Daniel Farrell
  • 9,316
  • 8
  • 39
  • 62
  • i get : AttributeError: 'AttributeManager' object has no attribute 'iteritems' ... any idea why/ how I can fix that? – Derek Eden Aug 29 '19 at 21:13
  • I am able to list all the paths using the function ending at print name – Derek Eden Aug 29 '19 at 21:16
  • It’s seems the h5py API has changed. – Daniel Farrell Aug 29 '19 at 21:45
  • I have an example of using `visititems` at this SO Answer: (https://stackoverflow.com/a/57067674/10462884) – kcw78 Aug 29 '19 at 21:57
  • 1
    thanks..this works but crashes like halfway through my file with: TypeError: No NumPy equivalent for TypeBitfieldID exists maybe just something weird with the data in that path ?? Not sure..but in any case I can use your example with the f.visit method to get a list of all the names...and reduce it down to only the datasets with data – Derek Eden Aug 30 '19 at 12:49
  • Derek Eden, does your comment ("_this works but crashes halfway thru_"). refer to my linked example? If so, that's not supposed to happen. Although simple, the `if/elif/else` logic is designed to capture any data type (worst case printing `unknown type` when it doesn't match any of the logical tests). The `node.dtype == 'object'` test is specifically designed for Matlab generated object datasets. Maybe there is a different data type that trips up the logic. `TypeBitfieldID` triggers a faint memory. It seems that's a HDF5 data structure that doesn't map cleanly to numpy (thus the error). – kcw78 Aug 30 '19 at 14:50
  • I should have searched SO before I answered. Turns out h5py does not support bitfields. Reference this answer from 2015: (https://stackoverflow.com/a/31430208/10462884) – kcw78 Aug 30 '19 at 14:53