Fastest Way to Sum Individual Columns

Question

I'm looking to optimize a program I've written and I'm really coming across some stumbling blocks. I have so many questions, I don't know where to begin but, for starters, I'll try to keep it simplified to an obstacle I can't seem to overcome.

The code I'm writing is a small schedule generator for work which requires 24/7 coverage. Each shift covers a two-week time span (some shifts rotate over a two week period but the coverage requirements must be maintained - which is why I have to use 14 days). As of right now, I'm trying to figure out the fastest way to check whether a combination of shifts adds up to the right number of people on a given day. I keep hearing Numpy is super fast at this type of stuff but when I run the following:

import numpy as np
import time

c_ar = np.array([1,1,1,1,0,0,0,1,1,1,1,0,0,0])
d_ar = np.array([0,0,0,1,1,1,1,0,0,0,1,1,1,1])
e_ar = np.array([0,0,0,1,1,1,1,0,0,0,1,1,1,1])
m_ar = np.array([0,1,1,0,1,1,0,0,1,1,0,1,1,0])
p_ar = np.array([1,1,0,0,0,1,1,1,1,0,0,0,1,1])

t0 = time.time()
x = c_ar[0] + d_ar[0] + e_ar[0] + m_ar[0] + p_ar[0]
t1 = time.time()
print t1-t0

I get back:

2.19345092773e-05

However, if I run:

c = [1,1,1,1,0,0,0,1,1,1,1,0,0,0]
d = [0,0,0,1,1,1,1,0,0,0,1,1,1,1]
e = [0,0,0,1,1,1,1,0,0,0,1,1,1,1]
m = [0,1,1,0,1,1,0,0,1,1,0,1,1,0]
p = [1,1,0,0,0,1,1,1,1,0,0,0,1,1]

t2 = time.time()
y = c[0] + d[0] + e[0] + m[0] + p[0]
t3 = time.time()
print t3-t2

I get back:

1.90734863281e-06

Am I missing something about Numpy that would make it faster than my example? Also, is there an even faster way than the two methods I used above?

Plus one for the interesting question. – Russia Must Remove Putin Sep 13 '14 at 22:09 — Russia Must Remove Putin, Sep 13 '14 at 22:09

unutbu · Accepted Answer · 2014-09-13T22:22:43.350

1

Put all the data into one NumPy array, then call numpy.sum once:

arr.sum(axis=0)

NumPy arrays are no faster than regular Python code when all you use it for is to access individual values item-by-item, as is done here:

c_ar[0] + d_ar[0] + e_ar[0] + m_ar[0] + p_ar[0]

Also, for arrays this small, regular Python code may be faster than using NumPy arrays:

c = [1,1,1,1,0,0,0,1,1,1,1,0,0,0]
d = [0,0,0,1,1,1,1,0,0,0,1,1,1,1]
e = [0,0,0,1,1,1,1,0,0,0,1,1,1,1]
m = [0,1,1,0,1,1,0,0,1,1,0,1,1,0]
p = [1,1,0,0,0,1,1,1,1,0,0,0,1,1]
arr = np.row_stack([c,d,e,m,p])

In [226]: %timeit c[0] + d[0] + e[0] + m[0] + p[0]
10000000 loops, best of 3: 189 ns per loop

In [231]: %timeit arr[:,0].sum()
100000 loops, best of 3: 4.73 µs per loop

In [239]: %timeit [c[i] + d[i] + e[i] + m[i] + p[i] for i in range(len(c))]
100000 loops, best of 3: 3.68 µs per loop

In [240]: %timeit arr.sum(axis=0)
100000 loops, best of 3: 5.04 µs per loop

edited Sep 13 '14 at 22:22

answered Sep 13 '14 at 21:55

unutbu

842,883
184
1,785
1,677

Thank you! The arrays I'm dealing with are usually 11 or 12 rows with 14 columns. I'll give these suggestions a shot and see what happens – user3833942 Sep 13 '14 at 22:10
By the way, [use the timeit module](http://stackoverflow.com/q/8220801/190597) to benchmark code. Using `time.time()` without repeated runs [can produce misleading results](http://stackoverflow.com/q/1622943/190597). But in any case, unless you are dealing with significantly bigger arrays, it's unclear that NumPy will be any faster than plain Python code. – unutbu Sep 13 '14 at 22:15
I will definitely remember that! Thanks again. – user3833942 Sep 13 '14 at 22:21

Russia Must Remove Putin · Answer 2 · 2014-09-13T22:08:30.890

Don't be so concerned with speed as with readability and directly using the functions as they're given as best as you can. Implementations may vary, but if you're doing semantically the correct thing, in the long run, you've made the right decision. Only optimize at the cost of those things if you profile your code and determine it to be an expensive bottleneck.

>>> np.array([[1, 1], [1, 1], [1, 1]])
array([[1, 1],
       [1, 1],
       [1, 1]])
>>> np.array([[1,1],[1,1], [1,1]]).sum(axis=0)
array([3, 3])

If you need to retain dimensionality:

>>> np.array([[1,1],[1,1], [1,1]]).sum(axis=0, keepdims=True)
array([[3, 3]])

One reason you might want to do this is to concatenate the sum as a row:

>>> arr = np.array([[1,1],[1,1], [1,1]])
>>> np.vstack((arr, arr.sum(axis=0, keepdims=True)))
array([[1, 1],
       [1, 1],
       [1, 1],
       [3, 3]])

score 0 · Answer 3 · answered Sep 14 '14 at 17:57

Probably a lot depends on how you arange your data and what you want to do with it. Converting to a numpy array just for summing up is not necessary. With this setup:

import numpy as np

a = [1,1,1,1,0,0,0,1,1,1,1,0,0,0]
....
l = [0,0,0,1,1,1,1,0,0,0,1,1,1,1]

arr = np.row_stack([a, b, c, d, e, f, g, h, i, j, k, l])

I get with Python

>>> %timeit [sum(col) for col in zip(a, b, c, d, e, f, g, h, i, j, k, l)]
100000 loops, best of 3: 8.84 µs per loop

Numpy

>>> %timeit arr.sum(0)
100000 loops, best of 3: 6.71 µs per loop

I experienced Cython to be faster taking sums of small arrays, but only when working within cython and not calling it often from python. That is if you'd move a bunch of computations to cython you might be better off with a little summation routine than using a numpy routine. A self made cython function sumcols happens to be slower when called from python

%timeit sumcols(arr)
100000 loops, best of 3: 9.86 µs per loop

Remember that if you'd had long rows, a transposed array might be faster, cause numpy arrays are Row-Major Order by default. It doesn't make a difference in this case though.

>>> arr_T = arr.transpose(1, 0).copy()
>>> %timeit arr_T.sum(1)
100000 loops, best of 3: 6.59 µs per loop

Fastest Way to Sum Individual Columns

3 Answers3