find the duplicates and apply a condition on other column in pandas

Question

firstly I need to check the serial no column and find the duplicates,once the duplicate are found then second conditions has to applied on the rank column and which is the least rank & i need to update the status with rank 1 in least rank and other duplicate column has be updated with rank 2

link to image

Welcome to SO! Please take a moment to read about how to post pandas questions: http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — YOLO, Feb 25 '20 at 15:26
Hi there, welcome to SO, take your time and read this post [mcve] and then edit your post with a textual sample of your dataframe. [this post](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) is an excellent guide in regards to pandas — Umar.H, Feb 25 '20 at 15:27

score 0 · Answer 1 · answered Feb 25 '20 at 17:03

0

Could you try this and check ?

counts = df.groupby(['Serial No'])['Rank'].count().gt(1).reset_index()
dup_sernos = counts[counts['Rank'] == True]['Serial No'].tolist()
df['Status'] = df[df['Serial No'].isin(dup_sernos)].sort_values(['Serial No', 'Rank']).groupby(['Serial No']).cumcount()+1
df['Status'] = df['Status'].fillna('')

answered Feb 25 '20 at 17:03

Sajan

1,247
1
5
13

can you please explain me the code for the better understanding – Deepweber Feb 27 '20 at 05:27
Sure. The first line and second line together identify serial nos. which have duplicates. The third line filters the dataframe by just taking serial nos. which have duplicates, then sorts ( using sort_values ) them by a combination of columns ( Serial No. and Rank ) in ascending order, followed by a cumulative count for each member of the group. The sort is done to ensure that the smaller rank for a serial no. comes first. The last line fills the 'na' values ( this would be there for non-duplicates post the groupby in the third line ) with empty string. – Sajan Feb 27 '20 at 08:52
The '+1' in the cumcount function ensures that the first value in 'Status' for a group with duplicate value starts with '1' and not '0'. Hope this is helpful. – Sajan Feb 27 '20 at 08:53

find the duplicates and apply a condition on other column in pandas

1 Answers1