I have a pandas DataFrame with the following structure. I am trying to group by / de-duplicate conditional on keeping the row with max value of one of the columns cnt
.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'name': ['p1', 'p1', 'p2', 'p2'],
'size': [10, 10, 20, 20],
'yr': [1990, 1990, 2000, 2000],
'res': [1, 2, 1, 2],
'cnt': [100, 80, 220, 400]
})
Expected Output:
name size yr res cnt
p1 10 1990 1 100
p2 20 2000 2 400