Pandas: Retain the a particular row based on weightage given on a particular column, if duplicates are present based on other columns after groupby

Question

I have a data frame df

df = pd.DataFrame([["A","X",98,56,61], ["B","E",79,54,36], ["A","Y",98,56,61],["B","F",79,54,36], ["A","Z",98,56,61], ["A","W",48,51,85],["B","G",44,57,86],["B","H",79,54,36]], columns=["id","class","c1","c2","c3"])

when we do groupby on id, if duplicate values(rows) are present based on multiple columns like c1,c2,c3, retain the row based on weighatge given on column class.

For example here when we do groupby on id A, c1,c2,c3 are duplicates for class X,Y,Z, among X,Y,Z weighatge given to X so retain X and delete other rows, similarly among E,F,H weightage given to F, so retain F and delete other rows.

Expected Output:

output = pd.DataFrame([["A","X",98,56,61],["B","F",79,54,36],["A","W",48,51,85],["B","G",44,57,86]], columns=["id","class","c1","c2","c3"])

How to do it?

its not clear why you want to retain `F` instead of `E` for id `B` for example, is there any specific weightage? can you please clarify — anky, Mar 26 '21 at 14:29
yes there is specific weightage, if X,Y,Z are duplicates, retain X row, if E,F,H are duplicates then retain F row — Chethan, Mar 26 '21 at 14:39
In that case., I do not think this is a duplicate question. reopened. But you should try to explain the question a bit more as it is slight confusing to read — anky, Mar 26 '21 at 14:40
@Chethan Are you defining the weightage based on the given `id` so lets say for id `A` you want to keep only the duplicate rows where class is `X`, similarly for id `B` you want to keep duplicated rows where the corresponding class is `F`? — Shubham Sharma, Mar 26 '21 at 15:00
no weightage only based on class column, if x,y,z then chose x row if the duplicate rows are present, if e,f,h then chose f row id duplicate is present. — Chethan, Mar 26 '21 at 15:03

anky · Accepted Answer · 2021-03-26T15:10:33.107

2

Based on your explanation, you can create a dictionary of the weightage and then create 2 conditions and then do:

#add classes for weightage incase of duplicates
cls = ['X','F']
c = df.duplicated(['id','c1','c2','c3'],keep=False) 
out = df[(c&df['class'].isin(cls))|~c]

print(out)

  id class  c1  c2  c3
0  A     X  98  56  61
3  B     F  79  54  36
5  A     W  48  51  85
6  B     G  44  57  86

edited Mar 26 '21 at 15:10

answered Mar 26 '21 at 15:03

anky

74,114
11
41
70

here u have given weightage based on id column, for giving weightage i am not using id column, it is purely on class column, like if duplicate row is present of x,y,z chose x row, if duplicate row is present in e,f,h choose f row like this – Chethan Mar 26 '21 at 15:08
@Chethan updated answer to accommodate that requirement. – anky Mar 26 '21 at 15:10

Pandas: Retain the a particular row based on weightage given on a particular column, if duplicates are present based on other columns after groupby

1 Answers1