1

In this following code is it better to use list comprehension or a generator?

from itertools import izip
n=2
l=izip(xrange(10**n), xrange(10**n))
print 3 not in [x[0] for x in l]
#or
#print 3 not in (x[0] for x in l)

In these tests if the list is large the generator is faster, if the lists are shorter the list comprehension apparently is faster.
Is this because the comprehension is computer just once?
For large lists: generator faster than listcomp
For small lists: generator slower than listcomp

titus
  • 5,512
  • 7
  • 25
  • 39
  • Does this answer your question? [List comprehension vs generator expression's weird timeit results?](https://stackoverflow.com/questions/11964130/list-comprehension-vs-generator-expressions-weird-timeit-results) – ggorlen Jul 12 '20 at 20:04

3 Answers3

5

in against a generator expression will make use of the __iter__() method and iterate the expression until a match is found, making it more efficient in the general case than the list comprehension, which produces the whole list first before scanning the result for a match.

The alternative for your specific example would be to use any(), to make the test more explicit. I find this to be a tad more readable:

any(x[0] == 3 for x in l)

You do have to take into account that in does forward the generator; you cannot use this method if you need to use the generator elsewhere as well.

As for your specific timing tests; your 'short' tests are fatally flawed. The first iteration the izip() generator will be entirely exhausted, making the other 9999 iterations test against an empty generator. You are testing the difference between creating an empty list and an empty generator there, amplifying the creation cost difference.

Moreover, you should use the timeit module to run tests, making sure that the test is repeatable. This means you have to create a new izip() object each iteration too; now the contrast is much larger:

>>> # Python 2, 'short'
...
>>> timeit.timeit("l = izip(xrange(10**2), xrange(10**2)); 3 not in (x[0] for x in l)", 'from itertools import izip', number=100000)
0.27606701850891113
>>> timeit.timeit("l = izip(xrange(10**2), xrange(10**2)); 3 not in [x[0] for x in l]", 'from itertools import izip', number=100000)
1.7422130107879639
>>> # Python 2, 'long'
...
>>> timeit.timeit("l = izip(xrange(10**3), xrange(10**3)); 3 not in (x[0] for x in l)", 'from itertools import izip', number=100000)
0.3002200126647949
>>> timeit.timeit("l = izip(xrange(10**3), xrange(10**3)); 3 not in [x[0] for x in l]", 'from itertools import izip', number=100000)
15.624258995056152

and on Python 3:

>>> # Python 3, 'short'
... 
>>> timeit.timeit("l = zip(range(10**2), range(10**2)); 3 not in (x[0] for x in l)", number=100000)
0.2624585109297186
>>> timeit.timeit("l = zip(range(10**2), range(10**2)); 3 not in [x[0] for x in l]", number=100000)
1.5555254180217162
>>> # Python 3, 'long'
... 
>>> timeit.timeit("l = zip(range(10**3), range(10**3)); 3 not in (x[0] for x in l)", number=100000)
0.27222433499991894
>>> timeit.timeit("l = zip(range(10**3), range(10**3)); 3 not in [x[0] for x in l]", number=100000)
15.76974998600781

In all cases, the generator variant is far faster; you have to shorten the 'short' version to just 8 tuples for the list comprehension to start to win:

>>> timeit.timeit("n = 8; l = izip(xrange(n), xrange(n)); 3 not in (x[0] for x in l)", 'from itertools import izip', number=100000)
0.2870941162109375
>>> timeit.timeit("n = 8; l = izip(xrange(n), xrange(n)); 3 not in [x[0] for x in l]", 'from itertools import izip', number=100000)
0.28503894805908203

On Python 3, where the implementations of generator expressions and list comprehensions were brought closer, you have to go down to 4 items before the list comprehension wins:

>>> timeit.timeit("n = 4; l = zip(range(n), range(8)); 3 not in (x[0] for x in l)", number=100000)
0.284480107948184
>>> timeit.timeit("n = 4; l = zip(range(n), range(8)); 3 not in [x[0] for x in l]", number=100000)
0.23570425796788186
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I would use a for loop so that it doesn't build the whole list but it seems ugly – titus Dec 12 '13 at 12:03
  • 1
    You can use `in` with generator expressions. It just consumes the generator up to the first match. – John La Rooy Dec 12 '13 at 12:06
  • @gnibbler: ick, so you can; `in` will use `__iter__()` if available to iterate until there is a match.. – Martijn Pieters Dec 12 '13 at 12:39
  • 1
    Of course - `in` behaves differently with Python 3.x's `range` in that it has a custom `__contains__` which mathematically computes membership rather than materialising the range... Which means the aboves tests could be completely different... – Jon Clements Dec 12 '13 at 12:48
0

Creating a generator is slower than creating a list, so you have to consider to variables: time for creating the object and time for testing the expression. So to answer your question, if by "better" you mean "faster": it depends on n.

Ecir Hana
  • 10,864
  • 13
  • 67
  • 117
0

There is a fair bit of overhead creating the generator expression, but eventually you make up for it by not needing to allocate a huge chunk of memory.

Small list comprehensions are faster as they don't have that overhead.

Usually the small cases are close enough, so in that case it's better to choose a generator expression

It is particularly important to conserve memory on a webserver where there might be 100's or 1000's of connections concurrently.

John La Rooy
  • 295,403
  • 53
  • 369
  • 502