Numpy array indexing and/or addition seems slow

Question

I was playing around with benchmarking numpy arrays because I was getting slower than expected results when I tried to replace python arrays with numpy arrays in a script.

I know I'm missing something, and I was hoping someone could clear up my ignorance.

I created two functions and timed them

NUM_ITERATIONS = 1000

def np_array_addition():
    np_array = np.array([1, 2])
    for x in xrange(NUM_ITERATIONS):
        np_array[0] += x
        np_array[1] += x


def py_array_addition():
    py_array = [1, 2]
    for x in xrange(NUM_ITERATIONS):
        py_array[0] += x
        py_array[1] += x

Results:

np_array_addition: 2.556 seconds
py_array_addition: 0.204 seconds

What gives? What's causing the massive slowdown? I figured that if I was using statically sized arrays numpy would be at least the same speed.

Thanks!

Update:

It kept bothering me that numpy array access was slow, and I figured "Hey, they're just arrays in memory right? Cython should solve this!"

And it did. Here's my revised benchmark

import numpy as np
cimport numpy as np    

ctypedef np.int_t DTYPE_t    


NUM_ITERATIONS = 200000
def np_array_assignment():
    cdef np.ndarray[DTYPE_t, ndim=1] np_array = np.array([1, 2])
    for x in xrange(NUM_ITERATIONS):
        np_array[0] += 1
        np_array[1] += 1    


def py_array_assignment():
    py_array = [1, 2]
    for x in xrange(NUM_ITERATIONS):
        py_array[0] += 1
        py_array[1] += 1

I redefined the np_array to cdef np.ndarray[DTYPE_t, ndim=1]

print(timeit(py_array_assignment, number=3))
# 0.03459
print(timeit(np_array_assignment, number=3))
# 0.00755

That's with the python function also being optimized by cython. The timing for the python function in pure python is

print(timeit(py_array_assignment, number=3))
# 0.12510

A 17x speedup. Sure it's a silly example, but I thought it was educational.

I'm guessing here, but I'd say it's the extra work from changing context between C and Python back and forth what is killing the performance here. — Ricardo Cárdenes, Mar 07 '14 at 00:46

score 5 · Accepted Answer · edited May 23 '17 at 11:54

This is not (just) addition which is slow, it is element access overhead, see for example:

def np_array_assignment():
    np_array = np.array([1, 2])
    for x in xrange(NUM_ITERATIONS):
        np_array[0] = 1
        np_array[1] = 1


def py_array_assignment():
    py_array = [1, 2]
    for x in xrange(NUM_ITERATIONS):
        py_array[0] = 1
        py_array[1] = 1

timeit np_array_assignment()
10000 loops, best of 3: 178 us per loop

timeit py_array_assignment()
10000 loops, best of 3: 72.5 us per loop

Numpy is fast with operating on vectors (matrices), when performed on the whole structure at once. Such single element-by-element operations are slow.

Use numpy functions to avoid looping, making operations on the whole array at once, i.e.:

def np_array_addition_good():
    np_array = np.array([1, 2])
    np_array += np.sum(np.arange(NUM_ITERATIONS))

The results comparing your functions with the one above are pretty revealing:

timeit np_array_addition()
1000 loops, best of 3: 1.32 ms per loop

timeit py_array_addition()
10000 loops, best of 3: 101 us per loop

timeit np_array_addition_good()
100000 loops, best of 3: 11 us per loop

But actually, you can do as good with pure python if you collapse the loops:

def py_array_addition_good():
        py_array = [1, 2]
        rangesum = sum(range(NUM_ITERATIONS))
        py_array = [x + rangesum for x in py_array]

timeit py_array_addition_good()
100000 loops, best of 3: 11 us per loop

All in all, with such simple operations there is really no improvement in using numpy. Optimized code in pure python works just as good.

There were a lot of questions about it and I suggest looking at some good answers there:

How do I maximize efficiency with numpy arrays?

numpy float: 10x slower than builtin in arithmetic operations?

score 4 · Answer 2 · answered Mar 07 '14 at 00:55

You're not actually using numpy's vectorized array addition if you do the loop in python; there's also the access overhead mentioned by @shashkello.

I took the liberty of increasing the array size a tad, and also adding a vectorized version of the addition:

import numpy as np
from timeit import timeit

NUM_ITERATIONS = 1000

def np_array_addition():
    np_array = np.array(xrange(1000))
    for x in xrange(NUM_ITERATIONS):
        for i in xrange(len(np_array)):
            np_array[i] += x

def np_array_addition2():
    np_array = np.array(xrange(1000))
    for x in xrange(NUM_ITERATIONS):
        np_array += x

def py_array_addition():
    py_array = range(1000)
    for x in xrange(NUM_ITERATIONS):
        for i in xrange(len(py_array)):
            py_array[i] += x

print timeit(np_array_addition, number=3)  # 4.216162
print timeit(np_array_addition2, number=3) # 0.117681
print timeit(py_array_addition, number=3)  # 0.439957

As you can see, the vectorized numpy version wins pretty handily. The gap will just get larger as array sizes and/or iterations increase.

Ah, that makes sense. So the vector addition shines through more with larger arrays — Joe Pinsonault, Mar 07 '14 at 01:14
You should remove the construction from the array loops, or use the numpy construction functions... — seberg, Mar 07 '14 at 13:55

Numpy array indexing and/or addition seems slow

Update:

2 Answers2

Linked

Related