Summing multiple rows having duplicate columns pandas

Question

Consider this dataframe

df = pd.DataFrame({'a': [1,2,1,3,4,2], 'c':['dd','ee','dd','as','ae','ee'], 'count':[5,9,1,6,8,3]})

   a   c  count
0  1  dd      5
1  2  ee      9
2  1  dd      1
3  3  as      6
4  4  ae      8
5  2  ee      3

As you can see there are duplicates in column 'a' 1 and 2 are repeated multiple times.

i want to sum the count of such in pandas like in sql we do groupby.

my final df should look like this

   a   c  count
0  1  dd      6
1  2  ee      12
2  3  as      6
3  4  ae      8

i tried by using df = df.groupby('a') but it is returning me

<pandas.core.groupby.DataFrameGroupBy object

jezrael · Accepted Answer · 2017-01-04T12:46:24.923

You need groupby by columns a and c with aggregating sum:

df = df.groupby(['a','c'], as_index=False)['count'].sum()
print (df)
   a   c  count
0  1  dd      6
1  2  ee     12
2  3  as      6
3  4  ae      8

But if need groupby only column a, then is necessary aggregate all columns which need in output - e.g. column c is aggregate by first and count by sum:

df = df.groupby('a').agg({'c':'first', 'count':'sum'}).reset_index()
print (df)
   a   c  count
0  1  dd      6
1  2  ee     12
2  3  as      6
3  4  ae      8

score 6 · Answer 2 · answered Jan 04 '17 at 12:42

6

You almost had it

df.groupby(['a', 'c']).sum().reset_index()

yields

   a   c  count
0  1  dd      6
1  2  ee     12
2  3  as      6
3  4  ae      8

answered Jan 04 '17 at 12:42

mr.bjerre

2,384
2
24
37

Summing multiple rows having duplicate columns pandas

2 Answers2

Related