0

I want to sum the C & D columns before removing the first duplicate value

For Example

Index     ID        A       B         C        D

0         AA       100      2         4        6

1         AA       200      3         5        4

2         BB        50      1         2        3

3         BB       300      4         1        0

Before removing the duplicates I want to sum the C & D columns and then remove the duplicates as shown below

Index     ID        A       B         C        D

1         AA       200      3         9        10

3         BB       300      4         3        3

How do I achieve this?

rafaelc
  • 57,686
  • 15
  • 58
  • 82
Mohan
  • 9
  • 1
  • Does this answer your question? [Pandas group-by and sum](https://stackoverflow.com/questions/39922986/pandas-group-by-and-sum) – Mykola Zotko Oct 08 '20 at 21:12

2 Answers2

3

Sounds like you need to transform first (i.e. broadcast the sum back to columns C and D), and just then drop the duplicated

df[['C', 'D']] = df.groupby('ID')[['C', 'D']].transform('sum')
df.loc[df.duplicated('ID')]

   Index  ID    A  B  C   D
1      1  AA  200  3  9  10
3      3  BB  300  4  3   3
rafaelc
  • 57,686
  • 15
  • 58
  • 82
2

Use aggregate groupby:

 df.groupby("ID").agg({'A':'last', 'B':'last','C':'sum','D':'sum'})
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74