I have a multidimensional pandas dataframe created like this:
import numpy as np
import pandas as pd
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
mindex = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 4), index=mindex)
store = pd.HDFStore("df.h5")
store["df"] = df
store.close()
I would like to add attributes to df
stored in the HDFStore. How can I do this? There doesn't seem to be any documentation regarding the attributes, and the group that is used to store the df
is not of the same type as the HDF5 Group in the h5py
module:
type(list(store.groups())[0])
Out[24]: tables.group.Group
It seems to be the pytables group, that has only this private member function that concerns some other kind of attribute:
__setattr__(self, name, value)
| Set a Python attribute called name with the given value.
What I would like is to simply store a bunch of DataFrames with multidimensional indices that are "marked" by attributes in a structured way, so that I can compare them and sub-select them based on those attributes.
Basically what HDF5 is meant to be used for + multidim DataFrames from pandas.
There are questions like this one, that deal with reading HDF5 files with other readers than pandas, but they all have DataFrames with one-dim indices, which makes it easy to simply dump numpy ndarrays, and store the index additionally.