2

I have unfortunately no way to check this, because I have no IDL or Matlab licenses, that's why I'm asking here.

The scenario is that I am saving HDF5 files with pandas and one of the columns is a Python datetime object. There are good HDF5 reading libraries for IDL and Matlab (so I have heard at least), but would a IDL/Matlab user be able to do anything useful with the datetime object found in the HDF5 file?

K.-Michael Aye
  • 5,465
  • 6
  • 44
  • 56
  • http://stackoverflow.com/questions/8776414/python-datetime-to-matlab-datenum – Tom Ron Mar 05 '14 at 07:26
  • I have seen that, but the answers only suggest to serialize to a string on the Python side, which is not what I want, if I can avoid it. Storing the datetime object would be much more efficient for me. – K.-Michael Aye Mar 05 '14 at 07:35
  • @TomRon The referenced question doesn't answer this, they just say "make it a string" ? – Andy Hayden Mar 05 '14 at 08:44

1 Answers1

2
In [28]: df = DataFrame({ 'A' : np.random.rand(5), 
                          'B' : range(5), 
                          'C' : date_range('20130101',periods=5,freq='T')})

In [29]: df
Out[29]: 
          A  B                   C
0  0.067509  0 2013-01-01 00:00:00
1  0.872840  1 2013-01-01 00:01:00
2  0.379634  2 2013-01-01 00:02:00
3  0.552827  3 2013-01-01 00:03:00
4  0.996150  4 2013-01-01 00:04:00

[5 rows x 3 columns]

In [30]: df.dtypes
Out[30]: 
A           float64
B             int64
C    datetime64[ns]
dtype: object

Write out a Table format.

In [32]: df.to_hdf('test.h5','df',mode='w',format='table')

Show the internal structure of the file

In [33]: !ptdump -avd test.h5
/ (RootGroup) ''
  /._v_attrs (AttributeSet), 4 attributes:


n [32]: df.to_hdf('test.h5','df',mode='w',format='table')

In [33]: !ptdump -avd test.h5
/ (RootGroup) ''
  /._v_attrs (AttributeSet), 4 attributes:
   [CLASS := 'GROUP',
    PYTABLES_FORMAT_VERSION := '2.1',
    TITLE := '',
    VERSION := '1.0']
/df (Group) ''
  /df._v_attrs (AttributeSet), 14 attributes:
   [CLASS := 'GROUP',
    TITLE := '',
    VERSION := '1.0',
    data_columns := [],
    encoding := None,
    index_cols := [(0, 'index')],
    info := {1: {'type': 'Index', 'names': [None]}, 'index': {}},
    levels := 1,
    nan_rep := 'nan',
    non_index_axes := [(1, ['A', 'B', 'C'])],
    pandas_type := 'frame_table',
    pandas_version := '0.10.1',
    table_type := 'appendable_frame',
    values_cols := ['values_block_0', 'values_block_1', 'values_block_2']]
/df/table (Table(5,)) ''
  description := {
  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),
  "values_block_1": Int64Col(shape=(1,), dflt=0, pos=2),
  "values_block_2": Int64Col(shape=(1,), dflt=0, pos=3)}
  byteorder := 'little'
  chunkshape := (2048,)
  autoindex := True
  colindexes := {
    "index": Index(6, medium, shuffle, zlib(1)).is_csi=False}
  /df/table._v_attrs (AttributeSet), 19 attributes:
   [CLASS := 'TABLE',
    FIELD_0_FILL := 0,
    FIELD_0_NAME := 'index',
    FIELD_1_FILL := 0.0,
    FIELD_1_NAME := 'values_block_0',
    FIELD_2_FILL := 0,
    FIELD_2_NAME := 'values_block_1',
    FIELD_3_FILL := 0,
    FIELD_3_NAME := 'values_block_2',
    NROWS := 5,
    TITLE := '',
    VERSION := '2.7',
    index_kind := 'integer',
    values_block_0_dtype := 'float64',
    values_block_0_kind := ['A'],
    values_block_1_dtype := 'int64',
    values_block_1_kind := ['B'],
    values_block_2_dtype := 'datetime64',
    values_block_2_kind := ['C']]
  Data dump:
[0] (0, [0.06750856214219292], [0], [1356998400000000000])
[1] (1, [0.8728395428343044], [1], [1356998460000000000])
[2] (2, [0.37963409103250334], [2], [1356998520000000000])
[3] (3, [0.5528271410494643], [3], [1356998580000000000])
[4] (4, [0.9961498806897623], [4], [1356998640000000000])

datetime64[ns] are serialized to nanoseconds since epoch in UTC and stored as an int64 column type (this is the same as numpy stores the underlying data). So its pretty straightforward to read this in as it is standard HDF5 format. You would need, however, to interpret the meta data. See the source file in pandas/io/pytables.py.

Basically you would look for datetime64 kind blocks (the kind maps the names of those coulmns). Then you can reverse convert in IDL/matlab (in pandas you would do pd.to_datetime(ns_since_epoch,unit='ns'). Timezones are a bit more tricky as the values are UTC, and the timezone is stored in the info attribute.

Note: this is slightly different in the interpretation of the meta-data for a Fixed format or if you have data_columns (but not very difficult to do).

Jeff
  • 125,376
  • 21
  • 220
  • 187