5

Consider the following operation in the limit of low length iterables,

d = (3, slice(None, None, None), slice(None, None, None))

In [215]: %timeit any([type(i) == slice for i in d])
1000000 loops, best of 3: 695 ns per loop

In [214]: %timeit any(type(i) == slice for i in d)
1000000 loops, best of 3: 929 ns per loop

Setting as a list is 25% faster than using a generator expression?

Why is this the case as setting as a list is an extra operation.

Note: In both runs I obtained the warning: The slowest run took 6.42 times longer than the fastest. This could mean that an intermediate result is being cached I

Analysis

In this particular test, list() structures are faster up to a length of 4 from which the generator has increased performance.

The red line shows where this event occurs and the black line shows where both are equal in performance.

enter image description here The code takes about 1min to run on my MacBook Pro by utilising all the cores:

import timeit, pylab, multiprocessing
import numpy as np

manager = multiprocessing.Manager()
g = manager.list([])
l = manager.list([])

rng = range(1,16) # list lengths
max_series = [3,slice(None, None, None)]*rng[-1] # alternate array types
series = [max_series[:n] for n in rng]

number, reps = 1000000, 5
def func_l(d):
    l.append(timeit.repeat("any([type(i) == slice for i in {}])".format(d),repeat=reps, number=number))
    print "done List, len:{}".format(len(d))
def func_g(d):
    g.append(timeit.repeat("any(type(i) == slice for i in {})".format(d), repeat=reps, number=number))
    print "done Generator, len:{}".format(len(d))

p = multiprocessing.Pool(processes=min(16,rng[-1])) # optimize for 16 processors
p.map(func_l, series) # pool list
p.map(func_g, series) # pool gens

ratio = np.asarray(g).mean(axis=1) / np.asarray(l).mean(axis=1)
pylab.plot(rng, ratio, label='av. generator time / av. list time')
pylab.title("{} iterations, averaged over {} runs".format(number,reps))
pylab.xlabel("length of iterable")
pylab.ylabel("Time Ratio (Higher is worse)")
pylab.legend()
lt_zero = np.argmax(ratio<1.)
pylab.axhline(y=1, color='k')
pylab.axvline(x=lt_zero+1, color='r')
pylab.ion() ; pylab.show()
Community
  • 1
  • 1
Alexander McFarlane
  • 10,643
  • 9
  • 59
  • 100
  • I thought this was non-trivial? Apparently down voters disagree? Is the question unclear? Why is the question in particular too broad? Is it not highly specific to the example given? – Alexander McFarlane Jun 27 '16 at 23:00
  • I was one of the down voters. I down voted it (before any edit) because, it seemed lack of research to me. But apparently it's not. So +1. – salmanwahed Jun 28 '16 at 09:33
  • @salmanwahed Thanks for the feedback, it is much appreciated as I strive to asks decent questions and provide good answers on the site – Alexander McFarlane Jun 29 '16 at 16:58

1 Answers1

2

The catch is the size of the items you are applying any on. Repeat the same process on a larger dataset:

In [2]: d = ([3] * 1000) + [slice(None, None, None), slice(None, None, None)]*1000

In [3]: %timeit any([type(i) == slice for i in d])
1000 loops, best of 3: 736 µs per loop

In [4]: %timeit any(type(i) == slice for i in d)
1000 loops, best of 3: 285 µs per loop

Then, using a list (loading all the items into memory) becomes much slower, and the generator expression plays out better.

Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
  • I am only dealing with lists up to length of 10. Why in this lower limit is this the case? – Alexander McFarlane Jun 27 '16 at 22:57
  • 2
    For a length of 10 (or rather 3 in your post), the tuple can easily be cached and reused across all iterations of the timer, whereas the generator version will always be lazy. – Moses Koledoye Jun 27 '16 at 22:59
  • any link to docs explaining this behaviour / key terms to look up? - I should note I tried searching the obvious `cashing list iteration` etc. and didn't obtain anything useful – Alexander McFarlane Jun 27 '16 at 23:09
  • 1
    Check this: [When is not a good time to use python generators?](http://stackoverflow.com/questions/245792/when-is-not-a-good-time-to-use-python-generators) and this: [How is tuple implemented in CPython?](http://stackoverflow.com/questions/14135542/how-is-tuple-implemented-in-cpython) – Moses Koledoye Jun 27 '16 at 23:14
  • You may be interested in the update: Accordingly, lists are faster in this test scenario up to a length of 4, from which generators are drastically faster! – Alexander McFarlane Jun 28 '16 at 02:22