10

Well, it seems like a couple of similar questions were asked here in stack overflow, but none of them seem like answered correctly or properly, nor they described the exact examples.

I have a problem with saving array or list into hdf5 ...

I have a several files contains list of (n, 35) dimensions, where n may be different in each file. Each of them can be saved in hdf5 with code below.

hdf = hf.create_dataset(fname, data=d)

However, if I want to merge them to make in 3d the error occurs as below.

Object dtype dtype('O') has no native HDF5 equivalent

I have no idea why it turns to dtype object, since what I have done is only this

all_data = list()
for fname in file_list:
    d = np.load(fname)
    all_data.append(d)
hdf = hf.create_dataset('all_data', data=all_data)

How can I save such data? I tried a couple of tests, and it seems like all_data turns to dtype with 'object' when I change them with

all_data = np.array(all_data)

Which looks it has the similar problem with saving hdf5.

Again, how can I save such data in hdf5?

kcw78
  • 7,131
  • 3
  • 12
  • 44
Isaac Sim
  • 539
  • 1
  • 7
  • 23
  • Since the `d` vary in shape, `numpy` can't make a 3d array from them. It has to make a 1d object dtype array instead. `h5py` can't save that (it only saves arrays, not lists or other python objects). You'll have settle for the original format, one array per `dataset`. – hpaulj Nov 18 '18 at 07:37
  • Similar issue in your neighboring SO question: https://stackoverflow.com/questions/53358695/how-to-create-a-2d-numpy-ndarray-using-two-list-comprehensions – hpaulj Nov 18 '18 at 07:42
  • See also https://stackoverflow.com/a/46422242/3327666 (some details on why you can store only simple arrays in a HDF5 file). – Pierre de Buyl Nov 18 '18 at 13:12

3 Answers3

9

I was running into a similar issue with h5py, and changing the type of the NumPy array using array.astype worked for me (I believe this changes the type from dtype('O') to the data type you specify). Please see the code snippet below:

import numpy as np

print(X.dtype) 
--> dtype('O')

print(X.astype(np.float64).dtype)
--> dtype('float64')

When I ran h5.create_dataset with this data type conversion, I was able to successfully create a h5 dataset. Hope this helps!

ONE ADDITIONAL UPDATE: I believe the NumPy object type 'O' is created when the NumPy array itself has mixed element types (e.g. np.int8 and np.float32).

Ryan Sander
  • 419
  • 3
  • 6
1

dtype('O') stands for object. In my case I had a list of lists where the lengths were different and got the same error. If you convert it to a numpy array numpy warns Creating an ndarray from ragged nested sequences. h5 files can't handle this type of data for more info see this post

Phillip Maire
  • 323
  • 2
  • 10
0

This error comes when I use:

   with h5py.File(peakfilename, 'w') as pfile:  # saves the data
        pfile['peakY'] = np.array(X)
        pfile['peakX'] = np.array(Y)

However when I used dtype when saving the arrays... the problem went away... I guess h5py is not able to create datasets from undefined data types.

   with h5py.File(peakfilename, 'w') as pfile:  # saves the data
        pfile['peakY'] = np.array(X, dtype=np.float32)
        pfile['peakX'] = np.array(Y, dtype=np.float32)
banikr
  • 63
  • 1
  • 9
  • It depends on the object type of `X` and `Y` and the array's dtype created by `np.array()`. What do you get when you enter `type(X)` and `type(np.array(X))`? (and the same for `Y`) – kcw78 Oct 31 '22 at 21:43
  • X and Y both are python lists. I don't have the data but I will try to regenerate the lists. Also the list elements could be of different lengths. – banikr Nov 01 '22 at 23:30