Better way to implement `df[m] = df[x] + df[y] + df[z]`

Question

I want to get the sum of three columns, the method I took is as follows:

In [14]:

a_pd = pd.DataFrame({'a': np.arange(3),
                     'b': [5, 7, np.NAN],
                     'c': [2, 9, 0]})
a_pd
Out[14]:
a   b   c
0   0   5.0 2
1   1   7.0 9
2   2   NaN 0
In [18]:

b_pd = a_pd['a'] + a_pd['b'] + a_pd['c']
b_pd
Out[18]:
0     7.0
1    17.0
2     NaN
dtype: float64

But as you can see, NaN can not be excluded. so I tried np.add(),but something wrong:

In [19]:

b_pd = a_pd[['a', 'b', 'c']].apply(np.add, axis=1)
b_pd
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-f52f400573b4> in <module>()
----> 1 b_pd = a_pd[['a', 'b', 'c']].apply(np.add, axis=1)
      2 b_pd

F:\anaconda\lib\site-packages\pandas\core\frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4045 
   4046         if isinstance(f, np.ufunc):
-> 4047             results = f(self.values)
   4048             return self._constructor(data=results, index=self.index,
   4049                                      columns=self.columns, copy=False)

ValueError: invalid number of arguments

So, I want to know how you fix this bug.

score 5 · Accepted Answer · answered Sep 30 '16 at 15:25

5

You can use the sum method of the DataFrame:

a_pd.sum(axis=1)
Out: 
0     7.0
1    17.0
2     2.0
dtype: float64

If you want to specify columns:

a_pd[['a', 'b', 'c']].sum(axis=1)
Out: 
0     7.0
1    17.0
2     2.0
dtype: float64

answered Sep 30 '16 at 15:25

ayhan

70,170
20
182
203

good ~ And I am confused that why it is `axis=1`. I think `axis=0` is correct because the plus is in horizontal-wise. – QM.py Sep 30 '16 at 15:54
1

@QM.py It's the opposite actually. If you want to perform an operation column-wise, you use `axis=0` (it is generally 0 by default) but if you want to apply it row-wise, you use `axis=1`. Take a look at [this question](http://stackoverflow.com/q/22149584/2285236) for a more detailed answer. – ayhan Sep 30 '16 at 16:02
`np.add` requires two inputs. This can be two scalars or two arrays (`np.add(a_pd['a'], a_pd['b'])` for example). But `apply` apply the function on separate columns or rows which are of type pd.Series. So since you are passing only a single array, it says the number of arguments is wrong. – ayhan Oct 01 '16 at 09:22
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/124709/discussion-between-qm-py-and-ayhan). – QM.py Oct 01 '16 at 12:56

score 2 · Answer 2 · answered Sep 30 '16 at 15:31

2

np.add requires inputs

b_pd = a_pd[['a', 'b', 'c']].apply(np.sum, axis=1)

answered Sep 30 '16 at 15:31

A.Kot

7,615
2
22
24

Better way to implement `df[m] = df[x] + df[y] + df[z]`

2 Answers2