0

I'm trying to save measurement attributes in an HDF5 file. I spend a lot of time working with files made with formatting where there appears to be a group of attributes with different datatypes inside of a single attribute entry.

For example, for my file, the command

f = h5py.File('test.data','r+')
f['Measurement/Surface'].attrs['X Converter']

produces

array([(b'LateralCat', b'Pixels', array([0.        , 2.00097752, 0.        , 0.        ]))],
      dtype=[('Category', 'O'), ('BaseUnit', 'O'), ('Parameters', 'O')])

Here, the first two entries are strings, and the third is an array. Now if I try to save the values to a different file:

f1 = h5py.File('test_output.data','r+')
f1['Measurement/Surface'].attrs.create('X Converter',[(b'LateralCat', b'Pixels', np.array([0.        , 2.00097752, 0.        , 0.        ]))])

I get this error:

Traceback (most recent call last): File "<pyshell#94>", line 1, in f1['Measurement/Surface'].attrs.create('X Converter',[(b'LateralCat', b'Pixels', np.array([0. , 2.00097752, 0. , 0. ]))]) File "C:\WinPython\WinPython-64bit-3.6.3.0Zero\python-3.6.3.amd64\lib\site-packages\h5py_hl\attrs.py", line 171, in create htype = h5t.py_create(original_dtype, logical=True) File "h5py\h5t.pyx", line 1611, in h5py.h5t.py_create File "h5py\h5t.pyx", line 1633, in h5py.h5t.py_create File "h5py\h5t.pyx", line 1688, in h5py.h5t.py_create TypeError: Object dtype dtype('O') has no native HDF5 equivalent

What am I missing?

  • Have you considered using pickle files instead? – C. Cooney Dec 28 '20 at 16:02
  • is it related to [this](https://stackoverflow.com/questions/53358689/object-dtype-dtypeo-has-no-native-hdf5-equivalent)? – YevKad Dec 28 '20 at 16:04
  • @yky It's a similar issue, but I can't use that solution because two my fields are not numeric. – Tar9etPractice Dec 28 '20 at 16:07
  • @C.Cooney I'd prefer to stick with HDF5 files for compatibility with other software that I'm using on this project. – Tar9etPractice Dec 28 '20 at 16:09
  • As @hpaulj noted, you have several object datatypes. Per the h5py documentation, HDF5 has a special type for object and region references. Since there is not an equivalent Numpy type, they are are represented with the “object” dtype (kind ‘O’). Details here: [h5py Object and Region References](https://docs.h5py.org/en/stable/refs.html) I posted an example in SO (and there are others). Search for "HDF5 object reference". Was this file created by Matlab? (It uses object references, so there are several SO Questions about this.) – kcw78 Dec 28 '20 at 17:59

1 Answers1

1

You aren't saving the same thing. The dtype of the original is significant.

In [101]: [(b'LateralCat', b'Pixels', np.array([0.        , 2.00097752, 0.        ,
     ...:  0.        ]))]
Out[101]: 
[(b'LateralCat',
  b'Pixels',
  array([0.        , 2.00097752, 0.        , 0.        ]))]
In [102]: np.array(_)
<ipython-input-102-7a2cd91c32ca>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  np.array(_)
Out[102]: 
array([[b'LateralCat', b'Pixels',
        array([0.        , 2.00097752, 0.        , 0.        ])]],
      dtype=object)

In [104]: np.array([(b'LateralCat', b'Pixels', np.array([0.        , 2.00097752, 0.
     ...:         , 0.        ]))],
     ...:       dtype=[('Category', 'O'), ('BaseUnit', 'O'), ('Parameters', 'O')])
Out[104]: 
array([(b'LateralCat', b'Pixels', array([0.        , 2.00097752, 0.        , 0.        ]))],
      dtype=[('Category', 'O'), ('BaseUnit', 'O'), ('Parameters', 'O')])

In [105]: x = _
In [106]: x.dtype
Out[106]: dtype([('Category', 'O'), ('BaseUnit', 'O'), ('Parameters', 'O')])

In [108]: x['Category']
Out[108]: array([b'LateralCat'], dtype=object)
In [109]: x['BaseUnit']
Out[109]: array([b'Pixels'], dtype=object)
In [110]: x['Parameters']
Out[110]: 
array([array([0.        , 2.00097752, 0.        , 0.        ])],
      dtype=object)

Though that doesn't quite solve it, since the dtype still contains object dtype fields.

In [111]: import h5py
In [112]: f=h5py.File('test.h5','w')
In [113]: 
In [113]: g = f.create_group('test')
In [114]: g.attrs.create('converter',x)
Traceback (most recent call last):
...
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

As noted in the comment, numpy object dtype is problematic when writing to h5py. Do you know how the original file was created? There may be some format or structure there that h5py is rendering as a compound dtype with object fields, but which isn't directly writable. I'd have to dig more into the docs (and maybe the original file) to learn more.

https://docs.h5py.org/en/stable/special.html

I can write that data as a more conventional structured array:

In [120]: y=np.array([(b'LateralCat', b'Pixels', np.array([0.        , 2.00097752,
     ...: 0.        , 0.        ]))],
     ...:       dtype=[('Category', 'S20'), ('BaseUnit', 'S20'), ('Parameters', 'fl
     ...: oat',4)])
In [121]: y
Out[121]: 
array([(b'LateralCat', b'Pixels', [0.        , 2.00097752, 0.        , 0.        ])],
      dtype=[('Category', 'S20'), ('BaseUnit', 'S20'), ('Parameters', '<f8', (4,))])

In [122]: g.attrs.create('converter',y)
In [125]: g.attrs['converter']
Out[125]: 
array([(b'LateralCat', b'Pixels', [0.        , 2.00097752, 0.        , 0.        ])],
      dtype=[('Category', 'S20'), ('BaseUnit', 'S20'), ('Parameters', '<f8', (4,))])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • The file was originally created using Zygo MetroPro, which is a proprietary software. So, unfortunately, I don't have detailed information on how they constructed the file. – Tar9etPractice Dec 30 '20 at 16:51