Summing multiple row values of various columns in Pandas

Question

I need to add row value of various columns and store it in same(or new) dataframe. eg: The dataframe looks something like this:

id  col1  col2  col3  col4 ...  col50
 1    1     12    3     44         0
 1    7      0    7      2         10
 1    2      3    0      4         9
 3    9      0    1      0         0
 3    1      1   11      1         0

And the expected values should be:

id  col1  col2  col3  col4...  col50
 1    10   15    10    46        19
 3    10    1    12     1         0

If I use tmp2 = tmp2.iloc[:,1:50].sum(), it changes the dimension of the dataframe.

Do you want to sum the values of `rows` for each columns or you want sum the values of each column individually ? if you are looking aggregate these over the rows then `df.agg(['sum'])` . — Karn Kumar, Oct 20 '18 at 12:10
you should accept the answer which works for you or atleast upvote, already have nice answer supplied by fellow colleagues .. — Karn Kumar, Oct 20 '18 at 12:24
@pygo that is exactly what I was thinking! All the answers are working (3 of them suggest the same thing), accepting anyone is doing injustice to others. Let me just upvote them without accepting any. — yamini goel, Oct 20 '18 at 12:30

score 5 · Answer 1 · answered Oct 20 '18 at 11:57

5

This is a grouping aggregation by id. Therefore, use a GroupBy object:

res = df.groupby('id', as_index=False).sum()

print(res)

   id  col1  col2  col3  col4  col50
0   1    10    15    10    50     19
1   3    10     1    12     1      0

answered Oct 20 '18 at 11:57

jpp

159,742
34
281
339

score 4 · Answer 2 · answered Oct 20 '18 at 11:57

4

Using pandas groupby() function in combination with sum() does the trick

df.groupby(['id']).sum()

answered Oct 20 '18 at 11:57

Felix

1,837
9
26

score 2 · Answer 3 · answered Oct 20 '18 at 12:01

Try using a pivot table and setting the id as an index:

import pandas as pd
import numpy as np

np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,3,size=(5, 4)), columns=["ID","A","B","C"])
print(df)

Raw df:

   ID  A  B  C
0   1  0  0  1
1   1  0  0  1
2   0  1  0  2
3   1  2  0  2
4   1  2  0  0

pivoted_df = pd.pivot_table(df, index="ID",values=["A","B","C"], aggfunc=np.sum)
print(pivoted_df)

This returns:

    A  B  C
ID         
0   1  0  2
1   4  0  4

score 2 · Answer 4 · answered Oct 20 '18 at 12:22

Just another approach with groupby using Single Grouping Column, Single Aggregation Function.

>>> df.groupby('id').agg('sum')
    col1  col2  col3  col4  col50
id
1     10    15    10    50     19
3     10     1    12     1      0

However, with agg you can use sum(), mean(), and max() etc, However, it will also facilitates Multiple Aggregation Functions like agg(['sum', 'mean']) or agg(['min', 'max'])

Summing multiple row values of various columns in Pandas

4 Answers4