Pandas has the following examples for how to store Series
, DataFrames
and Panels
in HDF5 files:
Prepare some data:
In [1142]: store = HDFStore('store.h5')
In [1143]: index = date_range('1/1/2000', periods=8)
In [1144]: s = Series(randn(5), index=['a', 'b', 'c', 'd', 'e'])
In [1145]: df = DataFrame(randn(8, 3), index=index,
......: columns=['A', 'B', 'C'])
......:
In [1146]: wp = Panel(randn(2, 5, 4), items=['Item1', 'Item2'],
......: major_axis=date_range('1/1/2000', periods=5),
......: minor_axis=['A', 'B', 'C', 'D'])
......:
Save it in a store:
In [1147]: store['s'] = s
In [1148]: store['df'] = df
In [1149]: store['wp'] = wp
Inspect what's in the store:
In [1150]: store
Out[1150]:
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df frame (shape->[8,3])
/s series (shape->[5])
/wp wide (shape->[2,5,4])
Close the store:
In [1151]: store.close()
Questions:
In the code above, when is the data actually written to disk?
Say I want to add thousands of large dataframes living in
.csv
files to a single.h5
file. I would need to load them and add them to the.h5
file one by one since I cannot afford to have them all in memory at once as they would take too much memory. Is this possible with HDF5? What would be the correct way to do it?The Pandas documentation says the following:
"These stores are not appendable once written (though you simply remove them and rewrite). Nor are they queryable; they must be retrieved in their entirety."
What does it mean by not appendable nor queryable? Also, shouldn't it say once closed instead of written?