Select subset of index and columns from multi-indexed pandas dataframe

Question

What is a generic way to slice a multi-indexed pandas dataframe along index and columns?

The documentation is dense and complete and worth reading (https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html) and there are a number of answers on stack overflow that answer how to do it focused on 'rows' or columns (and this answer is quite thorough, Select rows in pandas MultiIndex DataFrame). But, I wanted a more straight forward answer that had examples addressing both at the same time.

creating the multiindex dataframe

cols_index = pd.MultiIndex.from_product([['a','b','c'],
    ['x','y','z']], names=['first','second'])
rows_index = pd.MultiIndex.from_product([['d','e','f'],
    ['u','v','w']], names=['third','fourth'])
df = pd.DataFrame(np.random.choice(10, (9,9)), index=rows_index, columns=cols_index)

df
Out[161]: 
first         a           b         
second        c     d     c     d   
third         e  f  e  f  e  f  e  f
fourth fifth                        
j      m      9  8  0  1  5  6  3  5
       n      1  2  3  3  5  5  4  2
       o      5  2  4  7  3  1  0  4
k      m      6  6  3  3  4  4  1  7
       n      0  6  0  9  2  3  7  5
       o      7  8  0  9  7  8  3  4
l      m      4  7  4  3  0  5  6  3
       n      0  4  3  9  9  5  8  4
       o      0  1  8  0  8  9  4  7

I would like to see examples that slice this along a combination of levels in the index and columns.

Have you looked at `pd.IndexSlice`? What exactly is an operation you'd like to do? — user3483203, Sep 10 '19 at 17:27
For top level index/column, one can slice as normal, e.g. `df.loc['d', 'a']`. For the other levels, I believe one needs to resolve to `pd.MultiIndex.get_level_values(level)`. — Quang Hoang, Sep 10 '19 at 17:29

Pilgrim · Answer 1 · 2019-09-10T17:34:08.283

Here is my generic solution...

using loc with an index slicer to slice the index and column levels

idx = pd.IndexSlice

select everything - notice that the ':'s correspond to the number of levels in the index and columns

df.loc[idx[:,:], idx[:,:,:]]

Out[251]: 
first         a           b         
second        c     d     c     d   
third         e  f  e  f  e  f  e  f
fourth fifth                        
j      m      2  9  4  5  6  7  7  5
       n      1  4  2  6  8  0  6  3
       o      2  4  0  2  1  9  9  4
k      m      6  5  0  0  9  3  4  0
       n      3  1  6  4  2  3  0  4
       o      0  7  1  6  9  7  5  7
l      m      2  8  0  8  5  1  8  3
       n      7  3  2  6  9  4  1  7
       o      6  4  7  9  1  3  3  3

select specific "cell"

df.loc[idx['j','m'], idx['a','c','f']]

Out[252]: 9

select one level from index and one from column

df.loc[idx[:,'m'], idx[:,'c',:]]

Out[253]: 
first         a     b   
second        c     c   
third         e  f  e  f
fourth fifth            
j      m      2  9  6  7
k      m      6  5  9  3
l      m      2  8  5  1

subset to unique combination of column levels

df.loc[:, idx['b','d','f']]

Out[254]: 
fourth  fifth
j       m        5
        n        3
        o        4
k       m        0
        n        4
        o        7
l       m        3
        n        7
        o        3
Name: (b, d, f), dtype: int32

subset to unique combination of index levels

df.loc[idx['k','o'], :]

Out[255]: 
first  second  third
a      c       e        0
               f        7
       d       e        1
               f        6
b      c       e        9
              f        7
       d       e        5
              f        7
Name: (k, o), dtype: int32

Perhaps you should take this opportunity of the documentation not being clear and submit a PR with some of these examples. There are several examples that are a bit more intuitive in the docs to `IndexSlice`, but maybe a section demonstrating the same functionality in the advanced indexing portion would be useful? — user3483203, Sep 10 '19 at 17:31