Why is numpy slower than python? How to make code perform better

Question

I revrite my neural net from pure python to numpy, but now it is working even slower. So I tried this two functions:

def d():
    a = [1,2,3,4,5]
    b = [10,20,30,40,50]
    c = [i*j for i,j in zip(a,b)]
    return c

def e():
    a = np.array([1,2,3,4,5])
    b = np.array([10,20,30,40,50])
    c = a*b
    return c

timeit d = 1.77135205057

timeit e = 17.2464673758

Numpy is 10times slower. Why is it so and how to use numpy properly?

Maybe related: http://stackoverflow.com/questions/5956783/numpy-float-10x-slower-than-builtin-in-arithmetic-operations — Tamás, May 16 '13 at 20:48

mgilson · Accepted Answer · 2013-05-16T21:14:02.947

I would assume that the discrepancy is because you're constructing lists and arrays in e whereas you're only constructing lists in d. Consider:

import numpy as np

def d():
    a = [1,2,3,4,5]
    b = [10,20,30,40,50]
    c = [i*j for i,j in zip(a,b)]
    return c

def e():
    a = np.array([1,2,3,4,5])
    b = np.array([10,20,30,40,50])
    c = a*b
    return c

#Warning:  Functions with mutable default arguments are below.
# This code is only for testing and would be bad practice in production!
def f(a=[1,2,3,4,5],b=[10,20,30,40,50]):
    c = [i*j for i,j in zip(a,b)]
    return c

def g(a=np.array([1,2,3,4,5]),b=np.array([10,20,30,40,50])):
    c = a*b
    return c


import timeit
print timeit.timeit('d()','from __main__ import d')
print timeit.timeit('e()','from __main__ import e')
print timeit.timeit('f()','from __main__ import f')
print timeit.timeit('g()','from __main__ import g')

Here the functions f and g avoid recreating the lists/arrays each time around and we get very similar performance:

1.53083586693
15.8963699341
1.33564996719
1.69556999207

Note that list-comp + zip still wins. However, if we make the arrays sufficiently big, numpy wins hands down:

t1 = [1,2,3,4,5] * 100
t2 = [10,20,30,40,50] * 100
t3 = np.array(t1)
t4 = np.array(t2)
print timeit.timeit('f(t1,t2)','from __main__ import f,t1,t2',number=10000)
print timeit.timeit('g(t3,t4)','from __main__ import g,t3,t4',number=10000)

My results are:

0.602419137955
0.0263929367065

score 3 · Answer 2 · answered May 16 '13 at 20:48

3

import time , numpy
def d():
    a = range(100000)
    b =range(0,1000000,10)
    c = [i*j for i,j in zip(a,b)]
    return c

def e():
    a = numpy.array(range(100000))
    b =numpy.array(range(0,1000000,10))
    c = a*b
    return c



#python ['0.04s', '0.04s', '0.04s']
#numpy ['0.02s', '0.02s', '0.02s']

try it with bigger arrays... even with the overhead of creating arrays numpy is much faster

answered May 16 '13 at 20:48

Joran Beasley

110,522
12
160
179

1

2x is nothing compared to the 10x loss for smallish data. It's much easier to fall victim to the overhead cost. – John Jiang Mar 16 '22 at 18:56
very valid point. – Joran Beasley Mar 17 '22 at 05:24

Gill Bates · Answer 3 · 2013-05-17T17:21:55.797

Numpy data structures is slower on adding/constructing

Here some tests:

from timeit import Timer
setup1 = '''import numpy as np
a = np.array([])'''
stmnt1 = 'np.append(a, 1)'
t1 = Timer(stmnt1, setup1)

setup2 = 'l = list()'
stmnt2 = 'l.append(1)'
t2 = Timer(stmnt2, setup2)

print('appending to empty list:')
print(t1.repeat(number=1000))
print(t2.repeat(number=1000))

setup1 = '''import numpy as np
a = np.array(range(999999))'''
stmnt1 = 'np.append(a, 1)'
t1 = Timer(stmnt1, setup1)

setup2 = 'l = [x for x in xrange(999999)]'
stmnt2 = 'l.append(1)'
t2 = Timer(stmnt2, setup2)

print('appending to large list:')
print(t1.repeat(number=1000))
print(t2.repeat(number=1000))

Results:

appending to empty list:
[0.008171333983972538, 0.0076482562944814175, 0.007862921943675175]
[0.00015624398517267296, 0.0001191077336243837, 0.000118654852507942]
appending to large list:
[2.8521017080411304, 2.8518707386717446, 2.8022625940577477]
[0.0001643958452675065, 0.00017888804099541744, 0.00016711313196715594]

score 0 · Answer 4 · answered Apr 13 '23 at 17:03

import time , numpy
def d():
    a = range(100000)
    b =range(0,1000000,10)
    c = [i*j for i,j in zip(a,b)]
    return c

def e():
    a = numpy.array(np.arange(100000))
    b = numpy.array(np.arange(0,1000000,10))
    c = a*b
    return c

pure python
t1 = time.time()
d()
t2 = time.time()

print(t2-t1) 
time difference : 0.02503204345703125
with numpy
t1 = time.time()
e()
t2 = time.time()

print(t2-t1) 
time difference : 0.0010941028594970703
thus numpy is much faster

Your answer could be improved with additional supporting information. Please edit to add further details — user7247147, Apr 18 '23 at 22:16

score -1 · Answer 5 · answered May 17 '13 at 10:52

I don't think numpy is slow because it must take into account the time required to write and debug. The longer the program, the more difficult it is to find problems or add new features (programmer time). Therefore, to use a higher level language allows, at equal intelligence time and skill, to create a program complex and potentially more efficient.

Anyway, some interesting tools to optimize are:

-Psyco is a JIT (just in time, "real time"), which optimizes at runtime the code.

-Numexpr, parallelization is a good way to speed up the execution of a program, provided that is sufficiently separable.

-weave is a module within NumPy to communicate Python and C. One of its functions is to blitz, which takes a line of Python, the transparently translates C, and each time the call is executed optimized version. In making this first conversion requires around a second, but higher speeds generally get all of the above. It's not as Numexpr or Psyco bytecode, or interface C as NumPy, but your own function written directly in C and fully compiled and optimized.

Why is numpy slower than python? How to make code perform better

5 Answers5

Linked

Related