0

Given the following dataframe:

import pandas as pd
import numpy as np

indices = [['A', 'B', 'C'], ['a', 'b', 'c', 'd'], ['1', '2', '3']]

index = pd.MultiIndex.from_product(indices, names=['first', 'second', 'third'])

df = pd.DataFrame(np.random.randint(10, size=(36, 4)), index=index, columns=['Val1','Val2',' Val3', 'Val4'])

Which yields, for example:

                    Val1  Val2  Val3  Val4
first second third                        
A     a      1         1     7     2     1
             2         0     0     8     6
             3         3     2     0     5
      b      1         3     8     8     8
             2         3     1     0     5
             3         7     2     8     5
      c      1         9     9     5     3
             2         2     5     5     8
             3         7     5     1     5
      d      1         2     7     8     6
             2         9     0     0     2
             3         9     4     1     4
B     a      1         1     2     3     3
             2         3     2     3     1
             3         1     3     2     2
      b      1         4     4     3     1
             2         9     4     8     2
             3         6     7     8     8
      c      1         6     6     3     2
             2         2     6     5     6
             3         6     4     2     7
      d      1         1     1     1     5
             2         6     4     8     1
             3         3     4     3     1
C     a      1         0     3     4     0
             2         5     0     1     4
             3         1     1     5     7
      b      1         2     6     1     7
             2         2     6     4     3
             3         0     5     6     6
      c      1         0     2     3     7
             2         7     1     1     1
             3         2     6     2     0
      d      1         6     2     2     1
             2         9     3     1     9
             3         7     5     6     1

How would I go about filtering this dataframe by any index level and a specific column value?

EDIT

For example, if I want to keep all higher level indices (i.e. level=0 and level=1) if the corresponding third index (level=2) of 1 has a Val 2 column value greater than 5. Therefore, if index 1 in the third index does not have a Val2 value greater than 5, then the second level index corresponding to that index would be removed from the dataframe.

My attempt at a solution, or to at least convey my intent (since this produces an error):

df[df.loc[pd.IndexSlice[:, :, '1'], 'Val2'] > 5]

Which gives me the following error:

pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match)

In this example, my expected output would be as follows:

                    Val1  Val2  Val3  Val4
first second third                        
A     a      1         1     7     2     1
             2         0     0     8     6
             3         3     2     0     5
      b      1         3     8     8     8
             2         3     1     0     5
             3         7     2     8     5
      c      1         9     9     5     3
             2         2     5     5     8
             3         7     5     1     5
      d      1         2     7     8     6
             2         9     0     0     2
             3         9     4     1     4
B     c      1         6     6     3     2
             2         2     6     5     6
             3         6     4     2     7
C     b      1         2     6     1     7
             2         2     6     4     3
             3         0     5     6     6

Curious if this is achievable with df.filter() or if I am missing a simple way to perform this filtering on a MultiIndex? Thank you in advance.

rahlf23
  • 8,869
  • 4
  • 24
  • 54
  • 2
    That's a dupe, use `df.loc[(df.index.get_level_values(1) == 'something') & (df.index.get_level_values(2) == 'another')]` – rafaelc Sep 18 '18 at 15:10
  • I knew this was something obvious I overlooked, thank you. So in this case: `df[(df.index.get_level_values(2)=='1') & (df['Val2'] > 5)]` – rahlf23 Sep 18 '18 at 15:12
  • 1
    Yep ;) happy coding. Just be careful with using `1`, `2`, `3` etc as strings are you're doing now, this can be confusing afterwards – rafaelc Sep 18 '18 at 15:14
  • 1
    Absolutely, my columns and indices are distinct, just wanted a simple example to demonstrate. Thanks for your comment, much appreciated. – rahlf23 Sep 18 '18 at 15:17
  • Please see my edit. This question is a bit closer to what I am after: https://stackoverflow.com/q/49638955/8146556 – rahlf23 Sep 19 '18 at 05:16
  • I was able to get this to work: `pd.concat([grouped.get_group(key).groupby(level=1).filter(lambda x: (x.loc[pd.IndexSlice[:, :, '1'], 'Val2']>5).any()) for key, group in grouped])` – rahlf23 Sep 19 '18 at 05:36

0 Answers0