0

I have a pandas dataframe of numeric data, e.g.

A B C
1 a1 b1 1
2 a2 b2 2
3 a3 b3 1
4 a4 b4 2

where a_n and b_n are just any numeric values.

Notice in column C there are 2 entries of 1 and 2 entries of 2. How do I deal with these duplicates such that I sum rows that have duplicate entries in C? i.e. the result is:

A B C
1 a1+a3 b1+b3 1
2 a2+a4 b2+b4 2
ogb119
  • 119
  • 1
  • 10

1 Answers1

0

Simply groupby C and sum:

df.groupby('C', as_index=False).sum()

To keep the original order of the columns:

df.groupby('C', as_index=False).sum()[df.columns]
mozway
  • 194,879
  • 13
  • 39
  • 75
  • dupe, unfortunately – jezrael Aug 23 '21 at 10:55
  • I tried to bring some added value, but I guess OP could have searched better before asking – mozway Aug 23 '21 at 10:57
  • how do I also retain the original index? This answer works, except for keeping the original index. For example, in my question the index starts at 1 instead of 0. After doing groupby and sum, the index starts at 0. – ogb119 Aug 23 '21 at 12:39
  • nevermind it doesn't make sense to retain the original index, ignore that! thanks for the answer. – ogb119 Aug 23 '21 at 12:43
  • In python indexes and most counters start from 0, Assuming you save the output of the groupby+sum in `df_out` you could do `df_out.index += 1` – mozway Aug 23 '21 at 12:44