How to speed up numpy code

Question

I have the following code. In principle it takes 2^6 * 1000 = 64000 iterations which is quite a small number. However it takes 9s on my computer and I would like to run it for n = 15 at least.

from __future__ import division
import numpy as np
import itertools

n=6
iters = 1000
firstzero = 0
bothzero = 0
for S in itertools.product([-1,1], repeat = n+1):
    for i in xrange(iters):
        F = np.random.choice(np.array([-1,0,0,1], dtype=np.int8), size = n)
        while np.all(F ==0):
            F = np.random.choice(np.array([-1,0,0,1], dtype=np.int8), size = n)
        FS = np.convolve(F,S, 'valid')
        if (FS[0] == 0):
            firstzero += 1
        if np.all(FS==0):
            bothzero += 1

print "firstzero",    firstzero
print "bothzero",  bothzero

Is it possible to speed this up a lot or should I rewrite it in C?

Profiling indicates it spends most of it time in

   258003    0.418    0.000    3.058    0.000 fromnumeric.py:1842(all)
   130003    1.245    0.000    2.907    0.000 {method 'choice' of 'mtrand.RandomState' objects}
   388006    2.488    0.000    2.488    0.000 {method 'reduce' of 'numpy.ufunc' objects}
   128000    0.731    0.000    2.215    0.000 numeric.py:873(convolve)
   258003    0.255    0.000    2.015    0.000 {method 'all' of 'numpy.ndarray' objects}
   258003    0.301    0.000    1.760    0.000 _methods.py:35(_all)
   130003    0.470    0.000    1.663    0.000 fromnumeric.py:2249(prod)
   644044    1.483    0.000    1.483    0.000 {numpy.core.multiarray.array}
   130003    0.164    0.000    1.193    0.000 _methods.py:27(_prod)
   258003    0.283    0.000    0.624    0.000 numeric.py:462(asanyarray)

@MrE It is counting the number of times the convolution of two random arrays, one which is one longer than the other, with a particular probability distribution, has a 0 at the first position or a 0 in both posistion. — Simd, Apr 25 '14 at 14:18
Just a note. By improving a code you can only get linear speed ups. It won't you get very far unless you will get speed-ups of like 99%, unless you jump over the O(2^n) complexity. `n=15` will always work `2^15 / 2^6 = 2^9` slower than `n=6`, so to stay within the same time, you actually need a speed up of factor 512. — luk32, Apr 25 '14 at 14:41
@luk32 Yes. I suspect that currently it could be 512 times faster if coded properly in C for example by a better coder than me. It isn't doing a whole lot per iteration. — Simd, Apr 25 '14 at 14:42
@user2179021 I honestly doubt it =). IMO you can get maybe 10 at best, by changing language/tools. Note, that you are already using specialized tools, and the program does not spend time in your code. As John suggested maybe using different tools is the way to go. But I think you will probably have to parallelize it to get bigger factors and achieve bigger problem sizes... and you will still hit the O(2^n) wall rather sooner than later. But I wish you good luck. Maybe n=15 to 20 will be computable in a reasonable time. — luk32, Apr 25 '14 at 14:50

CT Zhu · Accepted Answer · 2014-04-25T19:18:54.553

An almost fully vectorized version of your code is much faster (16.9%), suppose yours is named f():

def g():
        n=6
        iters = 1000
        S=np.repeat(list(itertools.product([-1,1], repeat = n+1)),iters, axis=0).reshape((-1,n+1))
        F=np.random.choice(np.array([-1,0,0,1], dtype=np.int8), size = (iters*(2**(n+2)),n)) #oversampling
        F=F[~(F==0).all(1)][:iters*(2**(n+1))]
        FS=np.asanyarray(map(lambda x, y: np.convolve(x, y, 'valid'), F, S))
        firstzero=(FS[:,0]==0).sum()
        bothzero=(FS==0).all(1).sum()
        print "firstzero",    firstzero
        print "bothzero",  bothzero

Timing result:

In [164]:

%timeit f()
firstzero 27171
bothzero 12151
firstzero 27206
bothzero 12024
firstzero 27272
bothzero 12135
firstzero 27173
bothzero 12079
1 loops, best of 3: 14.6 s per loop
In [165]:

%timeit g()
firstzero 27182
bothzero 11952
firstzero 27365
bothzero 12174
firstzero 27318
bothzero 12173
firstzero 27377
bothzero 12072
1 loops, best of 3: 2.47 s per loop

You are welcome. I think try vectorization first, than `c` or `fortran`, or others. A lot of things in `numpy` or `scipy` are already in those under the hood, therefore, vectorization is all what is needed in like at least half of the time. — CT Zhu, Apr 25 '14 at 16:06

score 5 · Answer 2 · answered Apr 25 '14 at 14:35

5

I got a 35-40% speedup very easily by generating all the random choices in one shot:

for S in itertools.product([-1,1], repeat = n+1):
    Fx = np.random.choice(np.array([-1,0,0,1], dtype=np.int8), size=(iters,n))                                       
        for F in Fx:

That replaces the for i in xrange(iters) loop.

To get beyond this, I suspect you can use scipy.signal.fftconvolve to vectorize the convolutions themselves (np.convolve only supports 1D inputs). I didn't get to try this, partly because scipy.org is offline as I write this, but I hope this gives you enough to go on. The main idea will be to reduce the loops you do in Python, replacing them with vectorized operations as much as possible.

answered Apr 25 '14 at 14:35

John Zwinck

239,568
38
324
436

Thank you. I don't really understand the scipy.signal.fftconvolve. Are you suggesting doing a 2d convolution? – Simd Apr 25 '14 at 14:36
That will hardly do. You see, the rows of `np.all(F ==0)` has to be excluded. `Fx` must be generated at a size larger than `(iters, n)`. – CT Zhu Apr 25 '14 at 15:57
@user2179021 Convolution via FFT scales up for larger array sizes much better than a regular convolution. See this page for details: http://www.dspguide.com/ch18.htm – ArtemB Apr 25 '14 at 17:27

How to speed up numpy code

2 Answers2

Linked