5

I'm trying to open a group-less hdf5 file with pandas:

import pandas as pd
foo = pd.read_hdf('foo.hdf5')

but I get an error:

TypeError: cannot create a storer if the object is not existing nor a value are passed

I tried solving this by assigning a key:

foo = pd.read_hdf('foo.hdf5','key')

which works if key was a group, but the file has no groups, but rather several datasets in the highest hdf structure. i.e. the structure of the working file is: Groups --> Datasets, while the structure of the not working file is: Datasets. Both work fine when opening them with h5py, where I would use:

f = h5py.File('foo.hdf5','r')

and

dset = f['dataset']

to view a dataset. Any ideas how to read this in pandas?

hsnee
  • 543
  • 2
  • 6
  • 17

1 Answers1

2

I think you'are confused by different terminology - Pandas's HDF store key is a full path i.e. Group + DataSet_name...

demo:

In [67]: store = pd.HDFStore(r'D:\temp\.data\hdf\test.h5')

In [68]: store.append('dataset1', df)

In [69]: store.append('/group1/sub_group1/dataset2', df)

In [70]: store.groups
Out[70]:
<bound method HDFStore.groups of <class 'pandas.io.pytables.HDFStore'>
File path: D:\temp\.data\hdf\test.h5
/dataset1                              frame_table  (typ->appendable,nrows->9,ncols->2,indexers->[index])
/group1/sub_group1/dataset2            frame_table  (typ->appendable,nrows->9,ncols->2,indexers->[index])>

In [71]: store.items
Out[71]:
<bound method HDFStore.items of <class 'pandas.io.pytables.HDFStore'>
File path: D:\temp\.data\hdf\test.h5
/dataset1                              frame_table  (typ->appendable,nrows->9,ncols->2,indexers->[index])
/group1/sub_group1/dataset2            frame_table  (typ->appendable,nrows->9,ncols->2,indexers->[index])>

In [72]: store.close()

In [73]: x = pd.read_hdf(r'D:\temp\.data\hdf\test.h5', 'dataset1')

In [74]: x.shape
Out[74]: (9, 2)

In [75]: x = pd.read_hdf(r'D:\temp\.data\hdf\test.h5', '/group1/sub_group1/dataset2')

In [76]: x.shape
Out[76]: (9, 2)
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • The output is ``` File path: /path/foo.hdf5 Empty> ``` – hsnee Jun 24 '16 at 16:46
  • I don't think it's an issue of forgetting to close a file. I just tried opening and closing it with h5py, as I usually do, and it's working fine. I also tried creating 2 new hdf5 files. One has a structure: Group --> several datasets, and the other: several datasets. The first opens normally with pandas using the name of the group as a key, the second does not. – hsnee Jun 24 '16 at 20:14
  • @hsnee, what do you mean saying `Group`? Could you update your question with the not working example? – MaxU - stand with Ukraine Jun 24 '16 at 20:16
  • By groups, I mean exactly what they refer to in the HDF5 terminology hdfgroup.org/HDF5/doc1.6/UG/09_Groups.html – hsnee Jun 24 '16 at 20:24
  • OK, i guess `Group == key` (in Pandas terminology). Could you upload somewhere a small `h5` file which you can't open (which is not working) - it's going to be hard to help you, not being able to reproduce the problem... – MaxU - stand with Ukraine Jun 24 '16 at 20:29