in
against a generator expression will make use of the __iter__()
method and iterate the expression until a match is found, making it more efficient in the general case than the list comprehension, which produces the whole list first before scanning the result for a match.
The alternative for your specific example would be to use any()
, to make the test more explicit. I find this to be a tad more readable:
any(x[0] == 3 for x in l)
You do have to take into account that in
does forward the generator; you cannot use this method if you need to use the generator elsewhere as well.
As for your specific timing tests; your 'short' tests are fatally flawed. The first iteration the izip()
generator will be entirely exhausted, making the other 9999 iterations test against an empty generator. You are testing the difference between creating an empty list and an empty generator there, amplifying the creation cost difference.
Moreover, you should use the timeit
module to run tests, making sure that the test is repeatable. This means you have to create a new izip()
object each iteration too; now the contrast is much larger:
>>> # Python 2, 'short'
...
>>> timeit.timeit("l = izip(xrange(10**2), xrange(10**2)); 3 not in (x[0] for x in l)", 'from itertools import izip', number=100000)
0.27606701850891113
>>> timeit.timeit("l = izip(xrange(10**2), xrange(10**2)); 3 not in [x[0] for x in l]", 'from itertools import izip', number=100000)
1.7422130107879639
>>> # Python 2, 'long'
...
>>> timeit.timeit("l = izip(xrange(10**3), xrange(10**3)); 3 not in (x[0] for x in l)", 'from itertools import izip', number=100000)
0.3002200126647949
>>> timeit.timeit("l = izip(xrange(10**3), xrange(10**3)); 3 not in [x[0] for x in l]", 'from itertools import izip', number=100000)
15.624258995056152
and on Python 3:
>>> # Python 3, 'short'
...
>>> timeit.timeit("l = zip(range(10**2), range(10**2)); 3 not in (x[0] for x in l)", number=100000)
0.2624585109297186
>>> timeit.timeit("l = zip(range(10**2), range(10**2)); 3 not in [x[0] for x in l]", number=100000)
1.5555254180217162
>>> # Python 3, 'long'
...
>>> timeit.timeit("l = zip(range(10**3), range(10**3)); 3 not in (x[0] for x in l)", number=100000)
0.27222433499991894
>>> timeit.timeit("l = zip(range(10**3), range(10**3)); 3 not in [x[0] for x in l]", number=100000)
15.76974998600781
In all cases, the generator variant is far faster; you have to shorten the 'short' version to just 8 tuples for the list comprehension to start to win:
>>> timeit.timeit("n = 8; l = izip(xrange(n), xrange(n)); 3 not in (x[0] for x in l)", 'from itertools import izip', number=100000)
0.2870941162109375
>>> timeit.timeit("n = 8; l = izip(xrange(n), xrange(n)); 3 not in [x[0] for x in l]", 'from itertools import izip', number=100000)
0.28503894805908203
On Python 3, where the implementations of generator expressions and list comprehensions were brought closer, you have to go down to 4 items before the list comprehension wins:
>>> timeit.timeit("n = 4; l = zip(range(n), range(8)); 3 not in (x[0] for x in l)", number=100000)
0.284480107948184
>>> timeit.timeit("n = 4; l = zip(range(n), range(8)); 3 not in [x[0] for x in l]", number=100000)
0.23570425796788186