Regarding handling duplicates & summing certain column values in a pandas data frame-

Question

I want to sum the C & D columns before removing the first duplicate value

For Example

Index     ID        A       B         C        D

0         AA       100      2         4        6

1         AA       200      3         5        4

2         BB        50      1         2        3

3         BB       300      4         1        0

Before removing the duplicates I want to sum the C & D columns and then remove the duplicates as shown below

Index     ID        A       B         C        D

1         AA       200      3         9        10

3         BB       300      4         3        3

How do I achieve this?

Does this answer your question? [Pandas group-by and sum](https://stackoverflow.com/questions/39922986/pandas-group-by-and-sum) — Mykola Zotko, Oct 08 '20 at 21:12

score 3 · Answer 1 · answered Oct 08 '20 at 20:12

3

Sounds like you need to transform first (i.e. broadcast the sum back to columns C and D), and just then drop the duplicated

df[['C', 'D']] = df.groupby('ID')[['C', 'D']].transform('sum')
df.loc[df.duplicated('ID')]

   Index  ID    A  B  C   D
1      1  AA  200  3  9  10
3      3  BB  300  4  3   3

answered Oct 08 '20 at 20:12

rafaelc

57,686
15
58
82

Rafael, Thanks a lot for your help. I will try this code. – Mohan Oct 09 '20 at 05:37

score 2 · Answer 2 · answered Oct 08 '20 at 20:23

2

Use aggregate groupby:

 df.groupby("ID").agg({'A':'last', 'B':'last','C':'sum','D':'sum'})

answered Oct 08 '20 at 20:23

Quang Hoang

146,074
10
56
74

Regarding handling duplicates & summing certain column values in a pandas data frame-

2 Answers2