H5py store list of list of strings

Question

Is there a possibility in h5py to create a dataset which consists of lists of strings. I tried to create a nested datatype of variable length, but this results in segmentation fault in my python interpreter.

def create_dataset(h5py_file):
    data = [['I', 'am', 'a', 'sentecne'], ['another', 'sentence']]
    string_dt = h5py.special_dtype(vlen=str)
    nested_dt = h5py.special_dtype(vlen=string_dt)
    h5py_file.create_dataset("sentences", data=data, dtype = nested_dt)

score 8 · Answer 1 · answered Feb 01 '18 at 12:52

8

If you don't intend to edit the hdf5 file (and potentially use longer strings), you can also simply use:

h5py_file.create_dataset("sentences", data=np.array(data, dtype='S'))

answered Feb 01 '18 at 12:52

jan-glx

7,611
2
43
63

This will also will lead to problems if your data contains non-ASCII characters, read more about storing strings in HDF here: http://docs.h5py.org/en/stable/strings.html – jan-glx Jun 09 '20 at 13:15

score 2 · Answer 2 · edited May 23 '17 at 12:16

2

You should be able to get the functionality you want if you define your data as a numpy array of dtype=object as suggested in this post, rather than a list of lists.

def create_dataset(h5py_file):
    data = np.array([['I', 'am', 'a', 'sentence'], ['another', 'sentence']], dtype=object)
    string_dt = h5py.special_dtype(vlen=str)
    h5py_file.create_dataset("sentences", data=data, dtype=string_dt)

edited May 23 '17 at 12:16

Community

1
1

answered Jul 19 '16 at 18:07

Heather QC

680
8
11

`TypeError: Object dtype dtype('O') has no native HDF5 equivalent` - your reference has nothing to do with HDF files – Ed S. May 05 '21 at 18:43

H5py store list of list of strings

2 Answers2

Linked