2

I am trying to write regular python objects (which several key/value pairs) to a hdf5 file. I am using h5py 2.7.0 with python 3.5.2.3.

Right now, I am trying to write one object in its entirety to a dataset:

#...read dataset, store one data object in 'obj'
#obj could be something like: {'value1': 0.09, 'state': {'angle_rad': 0.034903, 'value2': 0.83322}, 'value3': 0.3}
dataset = h5File.create_dataset('grp2/ds3', data=obj)

This produces an error as the underlying dtype can not be converted to a native HDF5 equivalent:

  File "\python-3.5.2.amd64\lib\site-packages\h5py\_hl\group.py", line 106, in create_dataset
    dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
  File "\python-3.5.2.amd64\lib\site-packages\h5py\_hl\dataset.py", line 100, in make_new_dset
    tid = h5t.py_create(dtype, logical=1)
  File "h5py\h5t.pyx", line 1543, in h5py.h5t.py_create (D:\Build\h5py\h5py-hdf5
110-git\h5py\h5t.c:18116)
  File "h5py\h5t.pyx", line 1565, in h5py.h5t.py_create (D:\Build\h5py\h5py-hdf5
110-git\h5py\h5t.c:17936)
  File "h5py\h5t.pyx", line 1620, in h5py.h5t.py_create (D:\Build\h5py\h5py-hdf5
110-git\h5py\h5t.c:17837)
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

Is it possible to write the object to a HDF5 file in a "dynamic" way?

hpaulj
  • 221,503
  • 14
  • 230
  • 353
j9dy
  • 2,029
  • 3
  • 25
  • 39
  • 1
    No you can't write generic Python objects to a `hdf5` file (dictionaries, list, classes, etc). The file format is designed to save numeric data, roughly the equivalent of `numpy` arrays, and few other things like strings. – hpaulj May 18 '17 at 17:57
  • @hpaulj My objects contain fields of type int, double, float, and just a few strings. What I was asking is, if there is like an "automated" way to create a hdf5 compound type out of the object structure I have. So that I can reuse that compound type in a loop and just pass my objects to it. I know that I can have object-like structures in HDF5 by using compound types. But how can I create them with h5py? – j9dy May 19 '17 at 07:45
  • So you are asking about saving structured arrays? Compound dtype or object dtype? – hpaulj May 19 '17 at 08:28
  • @hpaulj Compound datatypes. I am not that familiar with h5py and python since I used HDF5s C++ API before. See https://support.hdfgroup.org/HDF5/Tutor/compound.html and here https://groups.google.com/forum/#!searchin/h5py/compound|sort:relevance/h5py/ZeHD4AGU7Ms/GFNXcWcUWL0J - The second link shows how this could be done in h5py. The code following after `if useCompoundType` is what I stumbled across just a few minutes ago. This looks promising. I am looking for a way to do his `datatype=[('timeStamp', 'float32'), ('value','float32')]` initialization in a dynamic manner. Any ideas? – j9dy May 19 '17 at 08:38
  • Do you have sample files that you could try reading with h5py? – hpaulj May 19 '17 at 08:41
  • @hpaulj unfortunately not. I am building these sort of hdf5 files (with compound types) for the first time. It should be possible to dynamically infer the type of my values and construct a list out of that though? I am thinking about something like: `1) loop through key/value of each object to infer its datatype. 2) select key (string) as the name 3) use type() to get the type of the value 4) have map that maps from the real datatype to a hdf5 datatype ('float32', etc)`. By that, I could dynamically infer the datatypes of my objects... What do you think? – j9dy May 19 '17 at 09:05
  • 1
    I havn't tried it myself, but maybe hickle could be useful in your case. https://github.com/telegraphic/hickle – max9111 May 19 '17 at 16:36

2 Answers2

1

If the object you want save is a nested dictionary, with numeric values, then it could be recreated with the group/set structure of a H5 file.

A simple recursive function would be:

def write_layer(gp, adict):
    for k,v in adict.items():
        if isinstance(v, dict):
            gp1 = gp.create_group(k)
            write_layer(gp1, v)
        else:
            gp.create_dataset(k, data=np.atleast_1d(v))

In [205]: dd = {'value1': 0.09, 'state': {'angle_rad': 0.034903, 'value2': 0.83322}, 'value3': 0.3}

In [206]: f = h5py.File('test.h5', 'w')
In [207]: write_layer(f, dd)

In [208]: list(f.keys())
Out[208]: ['state', 'value1', 'value3']
In [209]: f['value1'][:]
Out[209]: array([ 0.09])
In [210]: f['state']['value2'][:]
Out[210]: array([ 0.83322])

You might want to refine it and save scalars as attributes rather full datasets.

def write_layer1(gp, adict):
    for k,v in adict.items():
        if isinstance(v, dict):
            gp1 = gp.create_group(k)
            write_layer1(gp1, v)
        else:
            if isinstance(v, (np.ndarray, list)):
                gp.create_dataset(k, np.atleast_1d(v))
            else:
                gp.attrs.create(k,v)

In [215]: list(f.keys())
Out[215]: ['state']
In [218]: list(f.attrs.items())
Out[218]: [('value3', 0.29999999999999999), ('value1', 0.089999999999999997)]
In [219]: f['state']
Out[219]: <HDF5 group "/state" (0 members)>
In [220]: list(f['state'].attrs.items())
Out[220]: [('value2', 0.83321999999999996), ('angle_rad', 0.034903000000000003)]

Retrieving the mix of datasets and attributes is more complicated, though you could write code to hide that.


Here's a structured array approach (with a compound dtype)

Define a dtype that matches your dictionary structure. Nesting like this is possible, but can be awkward if too deep:

In [226]: dt=[('state',[('angle_rad','f'),('value2','f')]),
              ('value1','f'),
              ('value3','f')]
In [227]: dt = np.dtype(dt)

Make a blank array of this type, with several records; fill in one record with data from your dictionary. Note that the nest of tuples has to match the dtype nesting. More generally structured data is provided as a list of such tuples.

In [228]: arr = np.ones((3,), dtype=dt)
In [229]: arr[0]=((.034903, 0.83322), 0.09, 0.3)
In [230]: arr
Out[230]: 
array([(( 0.034903,  0.83322001),  0.09,  0.30000001),
       (( 1.      ,  1.        ),  1.  ,  1.        ),
       (( 1.      ,  1.        ),  1.  ,  1.        )], 
      dtype=[('state', [('angle_rad', '<f4'), ('value2', '<f4')]), ('value1', '<f4'), ('value3', '<f4')])

Writing the array to the h5 file is straight forward:

In [231]: f = h5py.File('test1.h5', 'w')
In [232]: g = f.create_dataset('data', data=arr)
In [233]: g.dtype
Out[233]: dtype([('state', [('angle_rad', '<f4'), ('value2', '<f4')]), ('value1', '<f4'), ('value3', '<f4')])
In [234]: g[:]
Out[234]: 
array([(( 0.034903,  0.83322001),  0.09,  0.30000001),
       (( 1.      ,  1.        ),  1.  ,  1.        ),
       (( 1.      ,  1.        ),  1.  ,  1.        )], 
      dtype=[('state', [('angle_rad', '<f4'), ('value2', '<f4')]), ('value1', '<f4'), ('value3', '<f4')])

In theory we could write functions like write_layer that work through your dictionary and construct the relevant dtype and records.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • I have managed to build my dtypes recursively based on your answer and my own comment in the original question. I have a follow up question though: How do I write the data into my hdf5 file? The datatypes I created might not match the structure of my obj. I think I need to transfer them into the order, in which the datatype is declared? Is this correct or is there a better way, i.e. the data will be (automatically) accessed by the name of the dtype array? For example, when I have Field A, B, C, D inside my original object but the dtype has the order of B,A,C,D... – j9dy May 23 '17 at 10:22
  • More detailed example: print(value.items()) = `dict_items([('streamName', 'vehicle_gas_pedal'), ('value', 0.0)]) [('streamName', 'O'), ('value', ' – j9dy May 23 '17 at 12:04
  • Sorry, I forgot to say that the type of `streamName` has been set like this: (pseudocode) `if(value_type == string) then obj_type[key] = h5py.special_dtype(vlen=str)` as proposed in the docs (http://docs.h5py.org/en/latest/strings.html) – j9dy May 23 '17 at 12:10
  • Start a new question. It's hard to display code in comments. – hpaulj May 23 '17 at 13:25
  • In Py3 `adict.items()` is a `dict_items` object; You can't make an array directly from it. I used `items()` in a Py3 appropriate way. Use `list(adict.items())` if you need a list of key/value tuples. – hpaulj May 24 '17 at 03:26
1

I know that your problem has already been solved, but I came across a similar problem today and wanted to share my solution. Related: Print all properties of a Python Class

Maybe it's gonna help someone. I wrote two little loop for saving/reading an (almost) arbitrary class object to/from an .hdf5-file:

import h5py

class testclass:
    def __init__(self, name = '', color = ''):
        self.name = name
        self.color = color

testobj = testclass('Chair', 'Red')

with h5py.File('test.hdf5', 'w') as f:
    for item in vars(testobj).items():
        f.create_dataset(item[0], data = item[1])

And then in the script where I want to load the file:

import h5py

class testclass:
    def __init__(self, name = '', color = ''):
        self.name = name
        self.color = color

testobj = testclass()

with h5py.File('test.hdf5', 'r') as f:
    for key in f.keys():
        setattr(testobj, key, f[key].value)

Works like a charm. The only restriction is that your class properties have to be compatible to h5py.

Forrest Thumb
  • 401
  • 5
  • 16