6

Trying to time different random functions to see fastest way to choose random item from list. %timeit wants to give me "best of 3" fastest times, but because the runs are random, there's high variance in access times (grab from back of list, will be slow; grab from front, will be fast).

How do I get average across all loops, not best of?

a = [0,6,3,1,3,9,4,3,2,6]

%timeit random.choice(a)
%timeit a[random.randint(0,len(a)-1)]
%timeit a[np.random.randint(0,len(a)-1)]
%timeit np.random.choice(a,1)[0]

Currently output (acknowledging variance in timing):

%timeit random.choice(a)
The slowest run took 9.87 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.23 µs per loop

Update: a kludge approach:

%time for i in range(100000): random.choice(a)
%time for i in range(100000): a[random.randint(0,len(a)-1)]
%time for i in range(100000): a[np.random.randint(0,len(a)-1)]
%time for i in range(100000): np.random.choice(a,1)[0]
nick_eu
  • 3,541
  • 5
  • 24
  • 38
  • "Trying to time different random functions to see fastest way to choose random item from list. " -- This kind of (probably) preemptive micro-optimization will get you nowhere. – Kijewski Feb 12 '16 at 20:24
  • @Kay I'm simulating random walks on a network of tens of millions of nodes. I promise -- even small differences will matter a lot. At the moment, the random draws are 60% of my running time. (And no, it's not pre-emptive -- I'm profiling like crazy) – nick_eu Feb 12 '16 at 20:29
  • 1
    Have you tried drawing from a numpy array rather than a list? I think that `np.random.choice` casts `a` to an array, which can be quite expensive. I see about a factor of 6 difference for a `len(10)` list vs array. – ali_m Feb 12 '16 at 20:55
  • Hmm -- Don't think I can do that. I have a c-compiled function giving me the list as input in the actual program, I just created the `a` list as an example. But thanks for thought! – nick_eu Feb 12 '16 at 21:03

2 Answers2

5

You could use timeit.repeat:

import timeit
import numpy as np

reps = timeit.repeat(repeat=3, n=10000,
                     stmt="np.random.choice(a)",
                     setup="import numpy as np; a=[0,6,3,1,3,9,4,3,2,6]")

# taking the median might be better, since I suspect the distribution of times will
# be heavily skewed
avg = np.mean(reps)

One potential issue is that you are likely to run into caching effects that could render your timings less meaningful (see here). You might, for example, want to generate a new random list on each iteration using the setup= argument.

Community
  • 1
  • 1
ali_m
  • 71,714
  • 23
  • 223
  • 298
0

How fast is

random_fd = open('/dev/urandom', 'rb')

a = array.array('I')
a.read(random_fd, 10**8)

get_next_rand = iter(a).next

for you? I'd just generate a huge amount of random numbers at once if that is your bottleneck.

In my aged PC:

%timeit array.array('I').read(open('/dev/urandom', 'rb'), 10**6)
1 loop, best of 3: 200 ms per loop
Kijewski
  • 25,517
  • 12
  • 101
  • 143
  • That's a nice idea. I'll need to tweak them after -- each list I sample from is a different length -- but may still be more efficient to generate random floats from 0-1, multiple by length, and round to an integer. Thanks! – nick_eu Feb 12 '16 at 21:05
  • 1
    You're welcome! Just make sure not to introduce a bias when you use modulo: http://stackoverflow.com/q/10984974/416224 – Kijewski Feb 12 '16 at 21:12