1

Let

import pandas as pd

df = pd.DataFrame(
    {
        'a': ['A', 'A', 'B', 'B', 'B', 'C'],
        'b': [True, True, True, False, False, True]
    }
)

print(df)

groups = df.groupby('a')  # "A", "B", "C"
agg_groups = groups.agg({'b':lambda x: all(x)}) # "A": True, "B": False, "C": True
agg_df = agg_groups.reset_index()
filtered_df = agg_df[agg_df["b"]]  # "A": True, "C": True

print(filtered_df)


# Now I want to get back the original df's rows, but only the remaining ones after group filtering


current output:

   a      b
0  A   True
1  A   True
2  B   True
3  B  False
4  B  False
5  C   True
   a     b
0  A  True
2  C  True

Required:

   a      b
0  A   True
1  A   True
2  B   True
3  B  False
4  B  False
5  C   True
   a     b
0  A  True
2  C  True
   a      b
0  A   True
1  A   True
5  C   True
Gulzar
  • 23,452
  • 27
  • 113
  • 201

3 Answers3

1

Use GroupBy.transform for get all Trues to mask with same size like original DataFrame, so possible use boolean indexing:

df1 = df[df.groupby('a')['b'].transform('all')]

#alternative
#f = lambda x: x.all()
#df1 = df[df.groupby('a')['b'].transform(f)]
print (df1)
   a     b
0  A  True
1  A  True
5  C  True

If want filter in aggregation function output is boolean Series and filter match indices mapped by original column a:

ids = df.groupby('a')['b'].all()

df1 = df[df.a.isin(ids.index[ids])]
print (df1)
   a     b
0  A  True
1  A  True
5  C  True

Your solution is similar with filter column b:

groups = df.groupby('a')
agg_groups = groups.agg({'b':lambda x: all(x)})

df1 = df[df.a.isin(agg_groups.index[agg_groups['b']])]
print (df1)
   a     b
0  A  True
1  A  True
5  C  True
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • how can I use a custom lambda instead of `.transform('all')`? – Gulzar Mar 27 '23 at 10:35
  • @Gulzar - I think you need last solution in answer? – jezrael Mar 27 '23 at 10:38
  • I think so, but a question about performance - it seems to the naked eye that `df.a.isin(agg_groups.index[agg_groups['b']])` runs a search of each `b` in the original index. I thought there is a way in which the groupby object knows the inds it was originated from without searching for it again – Gulzar Mar 27 '23 at 10:41
  • `df[df.groupby('a')['b'].transform(lambda x: all(x))]` – Lukas Hestermeyer Mar 27 '23 at 10:41
  • 1
    @Gulzar - `thought there is a way in which the groupby object knows the inds it was originated from without searching for it again ` - Unfortunately not if use aggregation. – jezrael Mar 27 '23 at 10:44
  • Trying to apply this to my code, I now see I am grouping by more than one column, which then fails the `isin`. Is there a quick fix? – Gulzar Mar 27 '23 at 11:04
  • [Opened a new question](https://stackoverflow.com/questions/75855026/in-pandas-how-to-retrieve-the-rows-which-created-each-group-after-aggregation) – Gulzar Mar 27 '23 at 11:08
0

df[df['a'].isin(filtered_df['a'].unique())]

Results in:

   a     b
0  A  True
1  A  True
5  C  True
Lukas Hestermeyer
  • 830
  • 1
  • 7
  • 19
  • I am not following why `unique` – Gulzar Mar 27 '23 at 10:46
  • 1
    To have a unique list of your grouping criteria. I think the solution using transform posted in jezraels answer is the most useful one. See my comment under his post for a way to use custom lambda functions in transform. – Lukas Hestermeyer Mar 27 '23 at 10:52
0

One can filter the original df by keeping the rows where the column a is present in the column a of the filtered_df in a variety of ways. Below will leave two potential options.

Option 1

As per OP's request to use a custom lambda, one can use pandas.DataFrame.apply as follows

final_df = df[df.apply(lambda row: row['a'] in filtered_df['a'].values, axis=1)]

[Out]:

   a      b
0  A   True
1  A   True
5  C   True

Option 2

Another way to solve it is to filtering the original df by keeping the rows where the column a is present in the column a of the filtered_df.

For that, one can use pandas.Series.isin as follows

finaldf = df[df['a'].isin(filtered_df['a'])]

[Out]:

   a      b
0  A   True
1  A   True
5  C   True

Notes:

Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83