0

I have a pandas DataFrame with the following structure. I am trying to group by / de-duplicate conditional on keeping the row with max value of one of the columns cnt.

import pandas as pd
import numpy as np

df = pd.DataFrame({
                   'name': ['p1', 'p1', 'p2', 'p2'],
                   'size': [10, 10, 20, 20],
                   'yr': [1990, 1990, 2000, 2000],
                   'res': [1, 2, 1, 2],
                   'cnt': [100, 80, 220, 400]
                  })

Expected Output:

name size yr    res cnt
p1   10   1990  1   100
p2   20   2000  2   400
kgh
  • 59
  • 4

1 Answers1

0

Try this:

df.sort_values(['cnt']).drop_duplicates(['name'], keep='last')
ErnestBidouille
  • 1,071
  • 4
  • 16