Slicing a MultiIndex DataFrame with a condition based on the index

Question

I have a dataframe which looks like this:

import pandas as pd
import numpy as np

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame([[24, 13,  8,  9],
   [11, 30,  7, 23],
   [21, 31, 12, 30],
   [ 2,  5, 19, 24],
   [15, 18,  3, 16],
   [ 2, 24, 28, 11],
   [23,  9,  6, 12],
   [29, 28, 11, 21]], index=arrays, columns=list('abcd'))


df
          a   b   c   d
bar one  24  13   8   9
    two  11  30   7  23
baz one  21  31  12  30
    two   2   5  19  24
foo one  15  18   3  16
    two   2  24  28  11
qux one  23   9   6  12
    two  29  28  11  21

I want to slice the dataframe such that the results contains all rows which have foo as value for their first index and all the rows which have bar as first level index and two as second level index. I.e. the resulting dataframe shoud look like this:

          a   b   c   d
bar two  11  30   7  23
foo one  15  18   3  16
    two   2  24  28  11

One way to get this result is

pd.concat([df.loc[[('bar', 'two')],:], df.loc[('foo', slice(None)),:]])

but this feels like a very cumbersome way, there must be a more "pythonic" way..

@Wen I just thought that there must be a way to get the results just by using the .loc / .xs methods but I wasn't able to figure it out. — crs, May 30 '18 at 16:04

score 7 · Answer 1 · answered May 30 '18 at 15:52

You can use default slicing

l0 = df.index.get_level_values(0)
l1 = df.index.get_level_values(1)
cond = (l0 == "foo") | ((l0=="bar") & (l1=="two"))
df[cond]

Output

        a   b   c   d
bar two 11  30  7   23
foo one 15  18  3   16
    two 2   24  28  11

score 6 · Accepted Answer · answered May 30 '18 at 15:50

6

query to the rescue:

df.query('ilevel_0 == "foo" or (ilevel_0 == "bar" and ilevel_1 == "two")')

          a   b   c   d
bar two  11  30   7  23
foo one  15  18   3  16
    two   2  24  28  11

xs, loc, etc all fail because your slicing across levels is not consistent.

answered May 30 '18 at 15:50

cs95

379,657
97
704
746

1

Using pandas for year, never head of `.query'. Is there a way to name to indices s.t. these names can be used instead of 'ilevel_0', 'ilevel_1' etc.? – crs May 30 '18 at 16:03
@crs Sure, use `df = df.rename_axis(['name1', 'name2'])` and you can substitute ilevel_* with your names. – cs95 May 30 '18 at 16:04

Slicing a MultiIndex DataFrame with a condition based on the index

2 Answers2

Linked