I want to add condition to extract the duplicate rows in a dataframe
DF
KEY STAT NUM ID
ab L 3 1678
cd D 4 23221
ab D 8 1678
cd L 0 38754
For duplicate key I need to check for ID if that is same then keep both the rows. If ID is different then I need to check NUM and keep the one with greater NUM value and ignore other
Expected OUTPUT
KEY STAT NUM ID
ab L 3 1678
cd D 4 23221
ab D 8 1678
- I have edited the scope a bit because I needed to do a merge on the KEY column.So I am planning to pull out both the duplicate values in the merged table. This is the code that I wrote keeping @piRSquared answer as reference but not getting desired dataframe.
def func(d):
if d.KEY.nunique()>=1:
if d.ID.nunique()>=1:
return d
else:
return d.nlargest(1,columns=['NUM'])
else:
return d
pd.concat([func(d) for _, d in DF.groupby('KEY')])