In [28]: df = DataFrame({ 'A' : np.random.rand(5),
'B' : range(5),
'C' : date_range('20130101',periods=5,freq='T')})
In [29]: df
Out[29]:
A B C
0 0.067509 0 2013-01-01 00:00:00
1 0.872840 1 2013-01-01 00:01:00
2 0.379634 2 2013-01-01 00:02:00
3 0.552827 3 2013-01-01 00:03:00
4 0.996150 4 2013-01-01 00:04:00
[5 rows x 3 columns]
In [30]: df.dtypes
Out[30]:
A float64
B int64
C datetime64[ns]
dtype: object
Write out a Table
format.
In [32]: df.to_hdf('test.h5','df',mode='w',format='table')
Show the internal structure of the file
In [33]: !ptdump -avd test.h5
/ (RootGroup) ''
/._v_attrs (AttributeSet), 4 attributes:
n [32]: df.to_hdf('test.h5','df',mode='w',format='table')
In [33]: !ptdump -avd test.h5
/ (RootGroup) ''
/._v_attrs (AttributeSet), 4 attributes:
[CLASS := 'GROUP',
PYTABLES_FORMAT_VERSION := '2.1',
TITLE := '',
VERSION := '1.0']
/df (Group) ''
/df._v_attrs (AttributeSet), 14 attributes:
[CLASS := 'GROUP',
TITLE := '',
VERSION := '1.0',
data_columns := [],
encoding := None,
index_cols := [(0, 'index')],
info := {1: {'type': 'Index', 'names': [None]}, 'index': {}},
levels := 1,
nan_rep := 'nan',
non_index_axes := [(1, ['A', 'B', 'C'])],
pandas_type := 'frame_table',
pandas_version := '0.10.1',
table_type := 'appendable_frame',
values_cols := ['values_block_0', 'values_block_1', 'values_block_2']]
/df/table (Table(5,)) ''
description := {
"index": Int64Col(shape=(), dflt=0, pos=0),
"values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),
"values_block_1": Int64Col(shape=(1,), dflt=0, pos=2),
"values_block_2": Int64Col(shape=(1,), dflt=0, pos=3)}
byteorder := 'little'
chunkshape := (2048,)
autoindex := True
colindexes := {
"index": Index(6, medium, shuffle, zlib(1)).is_csi=False}
/df/table._v_attrs (AttributeSet), 19 attributes:
[CLASS := 'TABLE',
FIELD_0_FILL := 0,
FIELD_0_NAME := 'index',
FIELD_1_FILL := 0.0,
FIELD_1_NAME := 'values_block_0',
FIELD_2_FILL := 0,
FIELD_2_NAME := 'values_block_1',
FIELD_3_FILL := 0,
FIELD_3_NAME := 'values_block_2',
NROWS := 5,
TITLE := '',
VERSION := '2.7',
index_kind := 'integer',
values_block_0_dtype := 'float64',
values_block_0_kind := ['A'],
values_block_1_dtype := 'int64',
values_block_1_kind := ['B'],
values_block_2_dtype := 'datetime64',
values_block_2_kind := ['C']]
Data dump:
[0] (0, [0.06750856214219292], [0], [1356998400000000000])
[1] (1, [0.8728395428343044], [1], [1356998460000000000])
[2] (2, [0.37963409103250334], [2], [1356998520000000000])
[3] (3, [0.5528271410494643], [3], [1356998580000000000])
[4] (4, [0.9961498806897623], [4], [1356998640000000000])
datetime64[ns]
are serialized to nanoseconds since epoch in UTC and stored as an int64
column type (this is the same as numpy stores the underlying data). So its pretty straightforward to read this in as it is standard HDF5 format. You would need, however, to interpret the meta data. See the source file in pandas/io/pytables.py
.
Basically you would look for datetime64
kind blocks (the kind maps the names of those coulmns). Then you can reverse convert in IDL/matlab (in pandas you would do pd.to_datetime(ns_since_epoch,unit='ns')
. Timezones are a bit more tricky as the values are UTC, and the timezone is stored in the info
attribute.
Note: this is slightly different in the interpretation of the meta-data for a Fixed
format or if you have data_columns
(but not very difficult to do).