Given the following dataframe:
import pandas as pd
import numpy as np
indices = [['A', 'B', 'C'], ['a', 'b', 'c', 'd'], ['1', '2', '3']]
index = pd.MultiIndex.from_product(indices, names=['first', 'second', 'third'])
df = pd.DataFrame(np.random.randint(10, size=(36, 4)), index=index, columns=['Val1','Val2',' Val3', 'Val4'])
Which yields, for example:
Val1 Val2 Val3 Val4
first second third
A a 1 1 7 2 1
2 0 0 8 6
3 3 2 0 5
b 1 3 8 8 8
2 3 1 0 5
3 7 2 8 5
c 1 9 9 5 3
2 2 5 5 8
3 7 5 1 5
d 1 2 7 8 6
2 9 0 0 2
3 9 4 1 4
B a 1 1 2 3 3
2 3 2 3 1
3 1 3 2 2
b 1 4 4 3 1
2 9 4 8 2
3 6 7 8 8
c 1 6 6 3 2
2 2 6 5 6
3 6 4 2 7
d 1 1 1 1 5
2 6 4 8 1
3 3 4 3 1
C a 1 0 3 4 0
2 5 0 1 4
3 1 1 5 7
b 1 2 6 1 7
2 2 6 4 3
3 0 5 6 6
c 1 0 2 3 7
2 7 1 1 1
3 2 6 2 0
d 1 6 2 2 1
2 9 3 1 9
3 7 5 6 1
How would I go about filtering this dataframe by any index level and a specific column value?
EDIT
For example, if I want to keep all higher level indices (i.e. level=0
and level=1
) if the corresponding third
index (level=2
) of 1
has a Val 2
column value greater than 5
. Therefore, if index 1
in the third
index does not have a Val2
value greater than 5
, then the second
level index corresponding to that index would be removed from the dataframe.
My attempt at a solution, or to at least convey my intent (since this produces an error):
df[df.loc[pd.IndexSlice[:, :, '1'], 'Val2'] > 5]
Which gives me the following error:
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match)
In this example, my expected output would be as follows:
Val1 Val2 Val3 Val4
first second third
A a 1 1 7 2 1
2 0 0 8 6
3 3 2 0 5
b 1 3 8 8 8
2 3 1 0 5
3 7 2 8 5
c 1 9 9 5 3
2 2 5 5 8
3 7 5 1 5
d 1 2 7 8 6
2 9 0 0 2
3 9 4 1 4
B c 1 6 6 3 2
2 2 6 5 6
3 6 4 2 7
C b 1 2 6 1 7
2 2 6 4 3
3 0 5 6 6
Curious if this is achievable with df.filter()
or if I am missing a simple way to perform this filtering on a MultiIndex
? Thank you in advance.