Speeding up a big a loop with a logarithm in Python

Question

I'm trying to speed up the following code:

from math import log
from random import random

def logtest1(N):
    tr=0
    for i in range(1,N):
        T= 40 + 10*random()
        tr += -log(random())/T

I'm fairly new to python (coming from matlab)... and this same code runs 5x slower in python than matlab (and Julia), which got my attention.

I tried using a numba and a parakeet wrapper, and numpy functions instead of python ones, but didn't get any improvement at all.

I'd appreciate any help. Thanks.

edit: the whole thing is a Monte Carlo simulation, so N is very large... 10e6 for testing purposes

Can you post your numpy code and your attempt to use numba? This type of loop, using `range` and with only simple arithmetic operations, is precisely the type of loop that numba excels at. I suppose it could be that the use of `math.log` and/or `random.random` trips a numba flag indicating that it won't try to jit those library methods. But the claim that numpy/numba doesn't speed this up is not reasonable so we need to see that code. — ely, Feb 13 '15 at 14:26
BTW, `range(1,N)` runs the loop `N-1` times. Is that what you intended? You can get a slight improvement in speed by changing your constants to `float`, i.e. 0.0, 40.0, 10.0. Also, `tr += -log(random())/T` is slightly slower than `tr = tr - log(random())/T`. But the speed improvement using these minor changes are tiny compared with the improvement using numpy as suggested by Carsten (which is why this is a comment & not an answer). — PM 2Ring, Feb 13 '15 at 14:27
No, range(1,N) is not intended, it should be (0,N), thank you for noticing. I tried your suggestion and got an improvement which made the code 4 times slower than the matlab one, (down from 5x). When I used numpy, I just replaced the log and random function with numpy ones, but didn't use arrays because as I explained below, the simulation loops for many times, and trying to create an array that big throws a memory error. — epx, Feb 13 '15 at 14:56

score 5 · Accepted Answer · edited Feb 13 '15 at 14:21

5

You should really be looking into numpy. And Scipy, while you're at it. Numpy is a speed-optimized package for N-dimensional array numerics, and Scipy is a collection of scientific computing stuff built upon numpy.

If you write the function using numpy arrays, it looks like this:

def logtest2(N):
    T = 40. + 10. * np.random.rand(N)
    return np.sum(-1*np.log(np.random.rand(N)) / T)

It's also a lot faster. Testing with N = 1000000 gave me a runtime of 500ms for your version and 75ms for this one.

edited Feb 13 '15 at 14:21

ely

74,674
34
147
228

answered Feb 13 '15 at 14:17

Carsten

17,991
4
48
53

Thank you, this is really helpful, but vectorizing this particular part of the code is not possible because the full MC simulation loops around 2480368834 times, and probably even more if I want more detailed results. – epx Feb 13 '15 at 14:50
1

@Esteban Do it in chunks, then. If you always vectorize a million runs, then you have to loop only 2481 times. – Carsten Feb 13 '15 at 15:05
Thanks Carsten. I will try this and I apologize if this was a sort of a newbie question, my field is not programming and I'm learning a lot, it's just that I dont have anyone to ask around since nobody here uses python (they all use matlab or fortran). Thanks again. – epx Feb 13 '15 at 15:12

score 1 · Answer 2 · edited May 23 '17 at 11:43

1

Right off the bat id say use xrange if you are using Python 2.7x So in 2.7 it would be:

def logtest1(N):
    tr=0
    for i in xrange(N):  
        a = random()   # Just generate the random number once
        T= 40 + 10*a
        tr += -log(a)/T

Here is a summary on why xrange is better: Should you always favor xrange() over range()?

edited May 23 '17 at 11:43

Community

1
1

answered Feb 13 '15 at 14:14

25r43q

613
4
16

Generating the number only once changes the distribution, don't think this is intended. – Daniel Feb 13 '15 at 14:16
Without knowing exactly what your intention is with the code... it felt like an area of improvement. Either way, xrange will do a lot and nympy as Carsten wrote will also help. – 25r43q Feb 13 '15 at 14:21
1

Thanks. Using xrange improved it for a 10% aprox, which is substantial but still not close enough to reach Julia speed. Using the same random number for both calculation is not something I can go with unfortunately. – epx Feb 13 '15 at 14:33

Speeding up a big a loop with a logarithm in Python

2 Answers2