Pandas DataFrame - groupby / remove duplicates and keep max of a column

Question

I have a pandas DataFrame with the following structure. I am trying to group by / de-duplicate conditional on keeping the row with max value of one of the columns cnt.

import pandas as pd
import numpy as np

df = pd.DataFrame({
                   'name': ['p1', 'p1', 'p2', 'p2'],
                   'size': [10, 10, 20, 20],
                   'yr': [1990, 1990, 2000, 2000],
                   'res': [1, 2, 1, 2],
                   'cnt': [100, 80, 220, 400]
                  })

Expected Output:

name size yr    res cnt
p1   10   1990  1   100
p2   20   2000  2   400

The best solution is to use `groupby.idxmax`. – mozway Oct 14 '22 at 19:04 — mozway, Oct 14 '22 at 19:04

score 0 · Answer 1 · answered Oct 14 '22 at 19:03

0

Try this:

df.sort_values(['cnt']).drop_duplicates(['name'], keep='last')

answered Oct 14 '22 at 19:03

ErnestBidouille

1,071
4
16

Pandas DataFrame - groupby / remove duplicates and keep max of a column

1 Answers1