I'm trying to set up and write to an HDF5 dataset using h5py (Python 3) that contains a one dimensional array of compound objects. Each compound object is made up of three variable length string properties.
with h5py.File("myfile.hdf5", "a") as file:
dt = np.dtype([
("label", h5py.string_dtype(encoding='utf-8')),
("name", h5py.string_dtype(encoding='utf-8')),
("id", h5py.string_dtype(encoding='utf-8'))])
dset = file.require_dataset("initial_data", (50000,), dtype=dt)
dset[0, "label"] = "foo"
When I run the example above, the last line of code causes h5py (or more accurately numpy) to throw an error saying:
"Cannot change data-type for object array."
Do I understand correctly that the type for "foo"
is not h5py.string_dtype(encoding='utf-8')
?
How come? And how can I fix this?
UPDATE 1:
Stepping into the stacktrace, I can see that the error is thrown from an internal numpy function called _view_is_safe(oldtype, newtype)
. In my case oldtype
is dtype('O')
but newtype
is dtype([('label', 'O')])
which causes the error to be thrown.
UPDATE 2: My question has been answered successfully below but for completeness I'm linking to a GH issue that might be related: https://github.com/h5py/h5py/issues/1921