Numpy sum on specified columns

Question

I have a dataframe like so:

              Name     A       B       C        D       E 
Date                                                              
2000-10-19    Pete     1       0       1        1       0
2000-10-20    Joan     1       1       0        0       1     
2000-10-23    Michael  0       0       1        0       1 
2000-10-24    Carl     0       1       1        1       1
2000-10-25    Levis    1       0       1        1       0
2000-10-26    Susan    0       0       0        1       1

And I would like to count the "1" for each row, så it look like this:

              Name     A       B       C        D       E      F
Date                                                              
2000-10-19    Pete     1       0       1        1       0      3
2000-10-20    Joan     1       1       0        0       1      3
2000-10-23    Michael  0       0       1        0       1      2
2000-10-24    Carl     0       1       1        1       1      4
2000-10-25    Levis    1       0       1        1       0      3
2000-10-26    Susan    0       0       0        1       1      2

I think it can be done easily but Numpy, but I can't quite figure out how

But I've come up with this, now I just have to specify which columns it is to be summed on

df['E'] = np.sum(df, axis=1)

Can anybody help

Isn't this a duplicate: https://stackoverflow.com/questions/25748683/pandas-sum-dataframe-rows-for-given-columns/25748826 — Erfan, Dec 15 '19 at 16:52
Not quite, but I probably didn't describe the question quite right either. To simplify the question, I probably removed a little plenty of data frame, what I did not mention was that I have a lot of other columns that contain numbers, but they must not be summed in the calculator, så I have not found the solution yet — jhjorsal, Dec 15 '19 at 17:10

Willem Van Onsem · Answer 1 · 2019-12-15T15:54:52.247

3

You can sum these up with:

df['F'] = np.sum(df[['A', 'B', 'C', 'D', 'E']], axis=1)

By using df[['A', 'B', 'C', 'D', 'E']] you thus select a subset of the columns (A, B, ..., E). Then we use np.sum(..) [numpy-doc] to sum up. By specifying the axis=1 parameter, we thus sum up per row, and we then assign the sums of these rows to a new column F.

edited Dec 15 '19 at 15:54

answered Dec 15 '19 at 15:46

Willem Van Onsem

443,496
30
428
555

Hi, this is a very interesting question - please can you explain this code - I am very curious what is happening. Specifically at `axis = 1` & `np.sum` – Eduards Dec 15 '19 at 15:51
To simplify the question, I probably removed a little plenty of data frame, what I did not mention was that I have a lot of other columns that contain numbers, but they must not be summed in the calculator, I tried first Willem Van Onsem solution, but there I get a sum like this in the first row ´10110.0´ and the second ´11001.0´ and so on, how can that be – jhjorsal Dec 15 '19 at 17:12
@jhjorsal: are you sure your columns are not strings? – Willem Van Onsem Dec 15 '19 at 17:18
Sorry yes, my mistake - I thing it is bedtime :-) – jhjorsal Dec 15 '19 at 17:20

score 1 · Answer 2 · answered Dec 15 '19 at 15:48

Or you can also use:

final=df.assign(F=df.drop('Name',1).sum(1))

               Name  A  B  C  D  E  F
Date                                 
2000-10-19     Pete  1  0  1  1  0  3
2000-10-20     Joan  1  1  0  0  1  3
2000-10-23  Michael  0  0  1  0  1  2
2000-10-24     Carl  0  1  1  1  1  4
2000-10-25    Levis  1  0  1  1  0  3
2000-10-26    Susan  0  0  0  1  1  2

ansev · Answer 3 · 2019-12-23T09:39:48.853

1

We can also do:

df['F']=df[df.eq(1)].count(axis=1)
print(df)

               Name  A  B  C  D  E  F
Date                                 
2000-10-19     Pete  1  0  1  1  0  3
2000-10-20     Joan  1  1  0  0  1  3
2000-10-23  Michael  0  0  1  0  1  2
2000-10-24     Carl  0  1  1  1  1  4
2000-10-25    Levis  1  0  1  1  0  3
2000-10-26    Susan  0  0  0  1  1  2

or

df['F']=df.eq(1).sum(axis=1)

edited Dec 23 '19 at 09:39

answered Dec 15 '19 at 16:13

ansev

30,322
5
17
31

score 1 · Answer 4 · answered Dec 15 '19 at 16:33

When you prefer using a pandas DataFrame, you can use:

import pandas as pd

df = pd.DataFrame([['John',0,1,0,0,1,0,1],
                   ['Kate',0,0,1,0,0,0,0],
                   ['Pete',1,1,1,0,1,0,1],],
                  columns=['Name', 'A', 'B', 'C', 'D', 'E', 'F', 'G'])

df['SUM'] = df.sum(axis=1)

Result:

   Name  A  B  C  D  E  F  G  SUM
0  John  0  1  0  0  1  0  1    3
1  Kate  0  0  1  0  0  0  0    1
2  Pete  1  1  1  0  1  0  1    5

When you prefer using a numpy array, you can use:

import pandas as pd
import numpy as np

df = pd.DataFrame([['John',0,1,0,0,1,0,1], ['Kate',0,0,1,0,0,0,0], ['Pete',1,1,1,0,1,0,1],],
                  columns=['Name', 'A', 'B', 'C', 'D', 'E', 'F', 'G'])
arr = df.values
totals = arr[:, 1:].sum(axis=1).reshape(-1,1)

np.hstack((arr, totals))

Result:

array([['John', 0, 1, 0, 0, 1, 0, 1, 3],
       ['Kate', 0, 0, 1, 0, 0, 0, 0, 1],
       ['Pete', 1, 1, 1, 0, 1, 0, 1, 5]], dtype=object)

Numpy sum on specified columns

4 Answers4