With your dtype I can create an array:
In [37]: np.array([_],dtype=SPECIAL_TYPE)
Out[37]:
array([ (array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]], dtype=uint8), 1, 'a', 1, 1, list([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]), 1)],
dtype=[('arr', 'O'), ('int1', 'u1'), ('str', 'O'), ('int2', 'u1'), ('int3', 'u1'), ('list', 'O'), ('int4', 'u1')])
But trying to create dataset
with it, even 1d, dumps me out of the interpreter:
In [38]: f=h5py.File('vlentest.h5','w')
In [39]: db = f.create_dataset('db',(10,), dtype=SPECIAL_TYPE)
In [40]: db[:]
Segmentation fault (core dumped)
There two issues - does vlen
work in a 2d array, and does it work in a compound dtype? You are pushing the bounds with multiple vlen in a dtype in a 2d array.
Have you seen documentation or examples using vlen
in a compound dtype?
Notice how h5py
implements the vlen
in numpy - it defines those fields a 'O' object dtype. That stores a pointer in the array, not the variable length object itself. Normally object dtype arrays cannot be saved with h5py
. But these fields must has some added annotation that h5py
uses to translate the pointer into the kind of structure that HDF5
accepts.
Storing string datasets in hdf5 with unicode explores how a vlen str is stored.
Storing multidimensional variable length array with h5py
Experimenting, stating with something small
In [14]: f = h5py.File('temp.h5')
In [15]: db1 = f.create_dataset('db1',(5,), dtype=dt1)
In [16]: db2 = f.create_dataset('db2',(5,), dtype=dt2)
In [17]: db1[:]
Out[17]:
array([('',), ('',), ('',), ('',), ('',)],
dtype=[('str', 'O')])
In [18]: db2[:]
Out[18]:
array([('', 0), ('', 0), ('', 0), ('', 0), ('', 0)],
dtype=[('str', 'O'), ('int4', '<i4')])
Setting some db1
values:
In [24]: db1[0]=('a',)
In [25]: db1[1]=('ab',)
In [26]: db1[:]
Out[26]:
array([('a',), ('ab',), ('',), ('',), ('',)],
dtype=[('str', 'O')])
db2
works the same way:
In [30]: db2[0]=('abc',10)
In [31]: db2[1]=('abcde',6)
In [32]: db2[:]
Out[32]:
array([('abc', 10), ('abcde', 6), ('', 0), ('', 0), ('', 0)],
dtype=[('str', 'O'), ('int4', '<i4')])
2 vlen strings also work:
In [34]: dt3 = np.dtype([("str1", h5py.special_dtype(vlen=str)),("str2", h5py.special_dtype(vlen=str))])
In [35]: db3 = f.create_dataset('db3',(3,), dtype=dt3)
In [36]: db3[:]
Out[36]:
array([('', ''), ('', ''), ('', '')],
dtype=[('str1', 'O'), ('str2', 'O')])
In [37]: db3[0] = ('abc','defg')
In [38]: db3[1] = ('abcd','de')
In [39]: db3[:]
Out[39]:
array([('abc', 'defg'), ('abcd', 'de'), ('', '')],
dtype=[('str1', 'O'), ('str2', 'O')])
and with an array vlen
In [41]: dt4 = np.dtype([("str1", h5py.special_dtype(vlen=str)),("list", h5py.special_dtype(vlen=np.int))])
In [42]: dt4
Out[42]: dtype([('str1', 'O'), ('list', 'O')])
In [43]: db4 = f.create_dataset('db4',(3,), dtype=dt4)
In [47]: db4[0]=('abcdef',np.arange(5))
In [48]: db4[1]=('abc',np.arange(3))
In [49]: db4[:]
Out[49]:
array([('abcdef', array([0, 1, 2, 3, 4])), ('abc', array([0, 1, 2])),
('', array([], dtype=int32))],
dtype=[('str1', 'O'), ('list', 'O')])
but I can't use a list
In [50]: db4[2]=('abc',[1,2,3,4])
--------------------------------------------------------------------------
AttributeError: 'list' object has no attribute 'dtype'
h5py
saves arrays, not lists. Apparently that applies to these nested values as well. http://docs.h5py.org/en/latest/special.html has examples of setting a vlen
with a list, but it has first converted to an array.
If I try to save a 2d array, it only writes a 1d
In [59]: db4[2]=('abc',np.ones((2,2),int))
In [60]: db4[:]
Out[60]:
array([('abcdef', array([0, 1, 2, 3, 4])), ('abc', array([0, 1, 2])),
('abc', array([1, 1]))],
dtype=[('str1', 'O'), ('list', 'O')])
This dtype works:
In [21]: dt1 = np.dtype([("str1", h5py.special_dtype(vlen=str)),('f1',int),("list", h5py.special_dtype(vlen=np.int))])
This does the core dump
In [30]: dt1 = np.dtype([("f0", h5py.special_dtype(vlen=np.uint8)),('f1',int),("f2", h5py.special_dtype(vlen=np.int))])
Is this a vlen uint8
problem, or a problem with a vlen be first?