1

I need to add row value of various columns and store it in same(or new) dataframe. eg: The dataframe looks something like this:

id  col1  col2  col3  col4 ...  col50
 1    1     12    3     44         0
 1    7      0    7      2         10
 1    2      3    0      4         9
 3    9      0    1      0         0
 3    1      1   11      1         0

And the expected values should be:

id  col1  col2  col3  col4...  col50
 1    10   15    10    46        19
 3    10    1    12     1         0

If I use tmp2 = tmp2.iloc[:,1:50].sum(), it changes the dimension of the dataframe.

jpp
  • 159,742
  • 34
  • 281
  • 339
yamini goel
  • 519
  • 2
  • 10
  • 23
  • Do you want to sum the values of `rows` for each columns or you want sum the values of each column individually ? if you are looking aggregate these over the rows then `df.agg(['sum'])` . – Karn Kumar Oct 20 '18 at 12:10
  • you should accept the answer which works for you or atleast upvote, already have nice answer supplied by fellow colleagues .. – Karn Kumar Oct 20 '18 at 12:24
  • @pygo that is exactly what I was thinking! All the answers are working (3 of them suggest the same thing), accepting anyone is doing injustice to others. Let me just upvote them without accepting any. – yamini goel Oct 20 '18 at 12:30

4 Answers4

5

This is a grouping aggregation by id. Therefore, use a GroupBy object:

res = df.groupby('id', as_index=False).sum()

print(res)

   id  col1  col2  col3  col4  col50
0   1    10    15    10    50     19
1   3    10     1    12     1      0
jpp
  • 159,742
  • 34
  • 281
  • 339
4

Using pandas groupby() function in combination with sum() does the trick

df.groupby(['id']).sum()
Felix
  • 1,837
  • 9
  • 26
2

Try using a pivot table and setting the id as an index:

import pandas as pd
import numpy as np

np.random.seed(1)
df = pd.DataFrame(np.random.randint(0,3,size=(5, 4)), columns=["ID","A","B","C"])
print(df)

Raw df:

   ID  A  B  C
0   1  0  0  1
1   1  0  0  1
2   0  1  0  2
3   1  2  0  2
4   1  2  0  0

pivoted_df = pd.pivot_table(df, index="ID",values=["A","B","C"], aggfunc=np.sum)
print(pivoted_df)

This returns:

    A  B  C
ID         
0   1  0  2
1   4  0  4
Dominique Paul
  • 1,623
  • 2
  • 18
  • 31
2

Just another approach with groupby using Single Grouping Column, Single Aggregation Function.

>>> df.groupby('id').agg('sum')
    col1  col2  col3  col4  col50
id
1     10    15    10    50     19
3     10     1    12     1      0

However, with agg you can use sum(), mean(), and max() etc, However, it will also facilitates Multiple Aggregation Functions like agg(['sum', 'mean']) or agg(['min', 'max'])

Karn Kumar
  • 8,518
  • 3
  • 27
  • 53