0

This is a follow up to this question

import pandas as pd

df = pd.DataFrame(
    {
        'a': ['A', 'A', 'B', 'B', 'B', 'C'],
        'b': ['A', 'A', 'B', 'B', 'B', 'C'],
        'hole': [True, True, True, False, False, True]
    }
)

print(df)

groups = df.groupby(['a', 'b'])  # "A", "B", "C"
agg_groups = groups.agg({'hole':lambda x: all(x)}) # "A": True, "B": False, "C": True

original_index_filtered = agg_groups.index[agg_groups['hole']]
original_filtered = df[df[['a', 'b']].isin(original_index_filtered)]
print(original_filtered)

now outputs

   a  b   hole
0  A  A   True
1  A  A   True
2  B  B   True
3  B  B  False
4  B  B  False
5  C  C   True
     a    b  hole
0  NaN  NaN   NaN
1  NaN  NaN   NaN
2  NaN  NaN   NaN
3  NaN  NaN   NaN
4  NaN  NaN   NaN
5  NaN  NaN   NaN

Seems like I am not doing it right when there is a multi index involved.

Gulzar
  • 23,452
  • 27
  • 113
  • 201

4 Answers4

0

If you want to check if two columns isin a (n, 2) matrix, you can use numpy broadcasting to do this. DataFrame.isin is designed to check if each element in the DataFrame isin (n,1) array.

m = df[['a', 'b']].to_numpy() == np.array([*original_index_filtered.to_numpy()])[:, None]
original_filtered = df[m.all(axis=-1).any(axis=0)]
print(m)

[[[ True  True]
  [ True  True]
  [False False]
  [False False]
  [False False]
  [False False]]

 [[False False]
  [False False]
  [False False]
  [False False]
  [False False]
  [ True  True]]]

print(original_filtered)

   a  b  hole
0  A  A  True
1  A  A  True
5  C  C  True
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
0

You should have used transform instead of agg:

df[groups['hole'].transform('all')]

That said, you can always use a merge to align your agg_groups and create a mask for boolean indexing:

cols = list(agg_groups.index.names)
m = df[cols].merge(agg_groups, left_on=cols, right_index=True, how='left')['hole']
out = df[m]

Output:

   a  b  hole
0  A  A  True
1  A  A  True
5  C  C  True
mozway
  • 194,879
  • 13
  • 39
  • 75
0

You can use Index.isin with MultiIndex created by a,b columns by DataFrame.set_index:

original_filtered = df[df.set_index(['a', 'b']).index.isin(original_index_filtered)]
print(original_filtered)
   a  b  hole
0  A  A  True
1  A  A  True
5  C  C  True
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Ended up doing it like so

import pandas as pd

df = pd.DataFrame(
    {
        'a': ['A', 'A', 'B', 'B', 'B', 'C'],
        'b': ['A', 'A', 'B', 'B', 'B', 'C'],
        'hole': [True, True, True, False, False, True]
    }
)

print(df)

groups = df.groupby(['a', 'b'])  # "A", "B", "C"
agg_groups = groups.agg({'hole': lambda x: len(x) > 0 and (not any(x) or all(x))}) # "A": True, "B": False, "C": True

original_index_filtered = agg_groups.index[agg_groups['hole']]
original_groups = groups.filter(lambda group: group.name in original_index_filtered)
original_filtered = original_groups.apply(lambda x: x.reset_index(drop=True))
print(original_filtered)

out

   a  b  hole
0  A  A  True
1  A  A  True
2  C  C  True
Gulzar
  • 23,452
  • 27
  • 113
  • 201