keep dataframe rows meeting a condition into each group of the same dataframe grouped by

Question

I have the following dataframe.

    c1  c2  v1  v2
0   a   a   1   2
1   a   a   2   3
2   b   a   3   1
3   b   a   4   5
5   c   d   5   0

I wish to have the following output.

    c1  c2  v1  v2
0   a   a   2   3
1   b   a   4   5
2   c   d   5   0

The rule. First group dataframe by c1, c2. Then into each group, keep the row with the maximun value in column v2. Finally, output the original dataframe with all the rows not satisfying the previous rule dropped.

What is the better way to obtain this result? Thanks.

Going around, I have found also this solution based on apply method

do you mind sharing the code you tried ? – aberna Jan 13 '15 at 15:20 — aberna, Jan 13 '15 at 15:20

score 2 · Accepted Answer · edited May 23 '17 at 12:20

2

You could use groupby-transform to generate a boolean selection mask:

grouped = df.groupby(['c1', 'c2'])
mask = grouped['v2'].transform(lambda x: x == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)

yields

  c1 c2  v1  v2
0  a  a   2   3
1  b  a   4   5
2  c  d   5   0

edited May 23 '17 at 12:20

Community

1
1

answered Jan 13 '15 at 15:33

unutbu

842,883
184
1,785
1,677

score 1 · Answer 2 · answered Sep 23 '22 at 12:37

1

If you want to make sure that you get one single row per group, you can sort the values by "v2" before grouping and then just take the last row (the one with the highest v2-value).

df = pd.DataFrame({"c1": ["a", "a", "b", "b", "c"], "c2": ["a", "a", "a", "a", "d"], "v1": [1, 2, 3, 4, 5], "v2": [2, 3, 1, 5, 0]})

df.sort_values("v2").groupby(["c1", "c2"]).last().reset_index()

result:

    c1  c2  v1  v2
0   a   a   2   3
1   b   a   4   5
2   c   d   5   0

answered Sep 23 '22 at 12:37

Nacho

11
1

+1, but yes, If there are multiple rows with the same max `v2` value in a group, then only one of them will be selected using this method, and you won't know which one. – AlexK Sep 28 '22 at 01:05

keep dataframe rows meeting a condition into each group of the same dataframe grouped by

2 Answers2

Linked