Why this list comprehension is faster than equivalent generator expression?

Question

I'm using Python 3.3.1 64-bit on Windows and this code snippet:

len ([None for n in range (1, 1000000) if n%3 == 1])

executes in 136ms, compared to this one:

sum (1 for n in range (1, 1000000) if n%3 == 1)

which executes in 146ms. Shouldn't a generator expression be faster or the same speed as the list comprehension in this case?

I quote from Guido van Rossum From List Comprehensions to Generator Expressions:

...both list comprehensions and generator expressions in Python 3 are actually faster than they were in Python 2! (And there is no longer a speed difference between the two.)

EDIT:

I measured the time with timeit. I know that it is not very accurate, but I care only about relative speeds here and I'm getting consistently shorter time for list comprehension version, when I test with different numbers of iterations.

A difference of 7% is pretty trivial—especially if you're not timing very accurately. (A typical naive implementation with `time` or `clock` instead of `timeit` for something that takes only 1/8th of a second can easily have an error much, much larger than 7%.) — abarnert, Apr 30 '13 at 19:21
Why are you comparing `len` with `sum`? Counting elements is a lot faster than adding their contents. — Tim Pietzcker, Apr 30 '13 at 19:25
Somewhat surprisingly, in PyPy 1.9.0 (which is Python 2.7.2, and doesn't have any of the modern genexp improvements), the genexp version is almost twice as fast (26.6ms vs. 49.7ms). The adding probably doesn't matter there (because in PyPy, adding integers is a few orders of magnitude faster than iterating), but I'm still a bit surprised by the results. — abarnert, Apr 30 '13 at 19:34
@TimPietzcker I'm adding `1`, which is probably the same as visiting each list element and incrementing a counter. If list is a regular structure then counting is much faster, but what about an overhead of creating the list? — Paul Jurczak, Apr 30 '13 at 19:53
@abarnert I'm using `timeit` and I tried larger number of iterations with the same result. — Paul Jurczak, Apr 30 '13 at 19:55
@PaulJurczak: *Probably*? I don't think `sum()` is smart enough to figure out that all it ever needs to add in this special case is just `1`s... — Tim Pietzcker, Apr 30 '13 at 19:56
@PaulJurczak: You can read the source, but I'm willing to bet that `sum` does a `PyNumber_InPlaceAdd` for each element returned by `PyIter_Next`, so there's no way it can optimize the case of always adding 1. — abarnert, Apr 30 '13 at 20:11
A JIT-compiled implementation like PyPy or Jython, can at least theoretically notice that the first few elements are all 1, and create an optimized fast-path that it will keep using as long as the iterator keeps returning 1. But that will never happen in CPython. — abarnert, Apr 30 '13 at 20:13

score 8 · Accepted Answer · answered Apr 30 '13 at 19:26

8

I believe the difference here is entirely in the cost of 1000000 additions. Testing with 64-bit Python.org 3.3.0 on Mac OS X:

In [698]: %timeit len ([None for n in range (1, 1000000) if n%3 == 1])
10 loops, best of 3: 127 ms per loop
In [699]: %timeit sum (1 for n in range (1, 1000000) if n%3 == 1)
10 loops, best of 3: 138 ms per loop
In [700]: %timeit sum ([1 for n in range (1, 1000000) if n%3 == 1])
10 loops, best of 3: 139 ms per loop

So, it's not that the comprehension is faster than the genexp; they both take about the same time. But calling len on a list is instant, while summing 1M numbers adds another 7% to the total time.

Throwing a few different numbers at it, this seems to hold up unless the list is very tiny (in which case it does seem to get faster), or large enough that memory allocation starts to become a significant factor (which it isn't yet, at 333K).

answered Apr 30 '13 at 19:26

abarnert

354,177
51
601
671

This is exactly what I found in my testing with Python 3.3.1 64bit (Win7). +1 – Tim Pietzcker Apr 30 '13 at 19:26
@TimPietzcker: Since you were apparently writing your comment at the same time I was writing my answer, I'm not surprised we were also running the exact same test simultaneously. :) – abarnert Apr 30 '13 at 19:28
1

For the sake of adding data -- with Python 3.2 32bit (Win7) I find the generator expression consistently 2% slower. Trivial, but reproducible. – Steven Rumbalski Apr 30 '13 at 20:06
I didn't know that `len` of `list` is O(1) - I've spent only a few days with Python so far. Thanks for pointing this out. – Paul Jurczak Apr 30 '13 at 20:14
1

@PaulJurczak: It's actually surprisingly hard to dig the performance guarantees out of the documentation. However, if you know that a `list` is just a resizeable array, and that `[0,1,2][3]` raises an `IndexError` instead of segfaulting, obviously it _must_ be keeping the length around somewhere, right? (In CPython, it's in the [`PyVarObject`](http://hg.python.org/cpython/file/3.3/Include/object.h#l111) header.) So, it would be silly to not just return it immediately. – abarnert Apr 30 '13 at 20:25
@StevenRumbalski: I don't know whether that's evidence that they've continued to improve genexps between 3.2 and 3.3, or that they've generally neglected optimizing 32-bit Python in favor of 64-bit… but it's an interesting data point. – abarnert Apr 30 '13 at 20:26
@abarnert _it must be keeping the length around somewhere_ - it also means that the list length is incremented once per iteration, so technically expression generator should be faster (no memory allocation and initialization), unless it is implemented as a list too. – Paul Jurczak Apr 30 '13 at 21:02
@PaulJurczak: Well (still talking CPython here), the embedded length is a C `long`, not a Python object, and doing `++(self->ob_size)` is orders of magnitude faster than the other stuff that happens once per iteration while building a list. You can profile it, but this is one of those nearly-free parts that can be ignored. As for memory allocation… if the list gets big enough, the genexp _is_ faster, as Niklas B. pointed out in a comment to a deleted answer (and I added into my answer), but at 333K there's enough unused space that the list gets it for free. – abarnert Apr 30 '13 at 21:50
@PaulJurczak: Finally, for initialization… well, the genexp initializes 333333 objects (the elements, one by one); the listcomp initializes 333334 (the element, plus the `list` itself). I wouldn't expect to be able to even measure that. (And in fact, the genexp probably initializes an iterator object of some kind under the covers, so probably there's _no_ difference, not just a tiny one). – abarnert Apr 30 '13 at 21:53
another data point: listcomp is twice as fast as genexpr (`[True for _ in range(1000000)]` vs. `list(True for _ in range(1000000))`) on my machine (Python 3.5.2, Ubuntu). ideone also shows that [listcomp](http://ideone.com/3XoZVB) is faster than [genexpr](http://ideone.com/tvuOHM) (though ideone is more a code example than an additional data point—comparing performance in the shared environment such as ideone is probably useless). – jfs Sep 20 '16 at 00:19

score 1 · Answer 2 · answered Sep 03 '18 at 05:06

Borrowed from this answer, there are two things to consider:

1. A Python list is index-able and fetching its length only takes O(1) times. This means that the speed of calling len() on a list does not depend on its size. However, if you call len() on a generator, you're consuming all the items it generates and thus, the time complexity is O(n).

2. See the linked answer above. A list comprehension is a tight C loop, whereas a generator has to store a reference to the iterator inside and call next(iter) for every item it generates. This creates another layer of overhead for generators. At a small scale, the difference in performance between list comprehension and generators can be safely ignored, but at a larger scale, you have to consider this.

Why this list comprehension is faster than equivalent generator expression?

2 Answers2

Linked