2

This works fine:

cols = ['X', 'Y']
ind = [('A', 1), ('B', 2)]
ind = pd.MultiIndex.from_tuples(index, names=['foo', 'number'])

df = pd.DataFrame(rand(2,2), columns = cols, index=ind)
store.put('df', df, table=True)
print store['df']

               X         Y
foo number                    
A   1       0.015005  0.213427
B   2       0.090311  0.595418

This breaks:

cols = [('X', 1), ('Y', 2)]
cols = pd.MultiIndex.from_tuples(index, names=['bar', 'number'])
ind = [('A', 1), ('B', 2)]
ind = pd.MultiIndex.from_tuples(index, names=['foo', 'number'])

df = pd.DataFrame(rand(2,2), columns = cols, index=ind)
store.put('df', df, table=True)
print store['df']

KeyError: u'no item named foo'

I suspect this is a known limitation of using PyTables, but I couldn't find any reference in the Pandas docs that the multiindex is in fact restricted to the index, not the columns.

jeffalstott
  • 2,643
  • 4
  • 28
  • 34

2 Answers2

3

This is not supported, e.g. BOTH a column-multi-index and an index multi-index. Either one alone works. However, in general a column multi-index is not very useful as its impossible to select from it with out some really odd syntax (the columns are stored as tuples, so they have to be explicity selected). So I wouldn't recommend it in any event.

I'll open an issue to support both, as it current raises, in any event, see here: https://github.com/pydata/pandas/issues/5823

Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Actually it is not that difficult to select multicolumns, for instance: ``idx = pd.IndexSlice; df.loc[:,idx[:, 'mean']]`` is something I do all the time. – ankostis Jun 05 '17 at 20:32
  • @ankostis this is from an on-disk store that serializes column names as strings and NOT an in memory frame – Jeff Jun 05 '17 at 22:01
1

Until #5823 is solved, you may collapse the index prior to storing it, as a workaround (see this SO how: https://stackoverflow.com/a/14508355/548792):

assert isinstance(df.columns, pd.MultiIndex), df
df.columns = ['.'.join(col).strip() for col in df.columns.values]
df.to_hdf(store, 'df', table=True)

And to recreate it, assuming no other dot(.) exists anywhere in the original column names:

df = store['/df']
df.columns = pd.MultiIndex.from_tuples([c.split('.') for c in df.columns])
ankostis
  • 8,579
  • 3
  • 47
  • 61