23

Show my code

>>> df = pd.DataFrame({'key1': ['a', 'a', 'b', 'b', 'a'], \
                   'key2': ['one', 'two', 'one', 'two', 'one'], \
                   'data1': np.random.randn(5), \
                   'data2': np.random.randn(5)})

>>> new_df = df.groupby(['key1', 'key2']).mean().unstack()
>>> print new_df
         data1               data2
key2       one       two       one       two
key1
a    -0.070742 -0.598649 -0.349283 -1.272043
b    -0.109347 -0.097627 -0.641455  1.135560 
>>> print new_df.columns
MultiIndex(levels=[[u'data1', u'data2'], [u'one', u'two']],
       labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
       names=[None, u'key2'])

As you can see, the MultiIndex dataframe is different with normal dataframes, so how to access the data in the MultiIndex dataframe.

GoingMyWay
  • 16,802
  • 32
  • 96
  • 149
  • 1
    Though it's not easy to follow the documentation (explanations buried into an ["advanced indexing"](https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advanced-indexing-with-hierarchical-index) section), keep in mind multilevel indexing is based on tuple indices, hence accessing data requires `loc` and tuples, even if there are ambiguous shortcuts not using `loc` and even not using tuples. – mins Jan 02 '21 at 16:15

2 Answers2

25

Accessing data in multiindex dataframe is similar to the way on a general dataframe. For example, if you want to read data at (a, data1.two), you can simply do: new_df['data1']['two']['a'] or new_df.loc['a', ('data1', 'two')]

Please read the official docs for more details.

Zhenhao Chen
  • 515
  • 4
  • 6
-1

This might helps you to know and visualize

unstacked = multi_indexDataFrame.unstack().dropna()
unstacked.plot(kind="bar")
sounish nath
  • 567
  • 4
  • 3