1

I have the following dataframe.

    c1  c2  v1  v2
0   a   a   1   2
1   a   a   2   3
2   b   a   3   1
3   b   a   4   5
5   c   d   5   0

I wish to have the following output.

    c1  c2  v1  v2
0   a   a   2   3
1   b   a   4   5
2   c   d   5   0

The rule. First group dataframe by c1, c2. Then into each group, keep the row with the maximun value in column v2. Finally, output the original dataframe with all the rows not satisfying the previous rule dropped.

What is the better way to obtain this result? Thanks.

Going around, I have found also this solution based on apply method

Community
  • 1
  • 1
Marcus
  • 159
  • 1
  • 1
  • 8

2 Answers2

2

You could use groupby-transform to generate a boolean selection mask:

grouped = df.groupby(['c1', 'c2'])
mask = grouped['v2'].transform(lambda x: x == x.max()).astype(bool)
df.loc[mask].reset_index(drop=True)

yields

  c1 c2  v1  v2
0  a  a   2   3
1  b  a   4   5
2  c  d   5   0
Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
1

If you want to make sure that you get one single row per group, you can sort the values by "v2" before grouping and then just take the last row (the one with the highest v2-value).

df = pd.DataFrame({"c1": ["a", "a", "b", "b", "c"], "c2": ["a", "a", "a", "a", "d"], "v1": [1, 2, 3, 4, 5], "v2": [2, 3, 1, 5, 0]})

df.sort_values("v2").groupby(["c1", "c2"]).last().reset_index()

result:

    c1  c2  v1  v2
0   a   a   2   3
1   b   a   4   5
2   c   d   5   0
Nacho
  • 11
  • 1
  • +1, but yes, If there are multiple rows with the same max `v2` value in a group, then only one of them will be selected using this method, and you won't know which one. – AlexK Sep 28 '22 at 01:05