Why is summing list comprehension faster than generator expression?

Question

Not sure if title is correct terminology.

If you have to compare the characters in 2 strings (A,B) and count the number of matches of chars in B against A:

sum([ch in A for ch in B])

is faster on %timeit than

sum(ch in A for ch in B)

I understand that the first one will create a list of bool, and then sum the values of 1. The second one is a generator. I'm not clear on what it is doing internally and why it is slower?

Thanks.

Edit with %timeit results:

10 characters

generator expression

list

10000 loops, best of 3: 112 µs per loop

10000 loops, best of 3: 94.6 µs per loop

1000 characters

generator expression

list

100 loops, best of 3: 8.5 ms per loop

100 loops, best of 3: 6.9 ms per loop

10,000 characters

generator expression

list

10 loops, best of 3: 87.5 ms per loop

10 loops, best of 3: 76.1 ms per loop

100,000 characters

generator expression

list

1 loop, best of 3: 908 ms per loop

1 loop, best of 3: 840 ms per loop

How long are the strings? Generators are data-lazy, it's possible this adds some overhead that for short strings will be greater than what you save by not copying the data. — millimoose, Jul 19 '20 at 01:15
Hmm even with strings of length 10000000, the list version is faster than the generator version... I'm also curious about the speed difference. — jkr, Jul 19 '20 at 01:21
Asymptotically, they appear to be converging. Eventually, the cost of having to do two traversals and build the list may outweigh the more expensive generator traversal. — chepner, Jul 19 '20 at 12:03

Mario Ishac · Accepted Answer · 2020-07-19T02:07:21.493

I took a look at the disassembly of each construct (using dis). I did this by declaring these two functions:

def list_comprehension():
    return sum([ch in A for ch in B])

def generation_expression():
    return sum(ch in A for ch in B)

and then calling dis.dis with each function.

For the list comprehension:

 0 BUILD_LIST               0
 2 LOAD_FAST                0 (.0)
 4 FOR_ITER                12 (to 18)
 6 STORE_FAST               1 (ch)
 8 LOAD_FAST                1 (ch)
10 LOAD_GLOBAL              0 (A)
12 COMPARE_OP               6 (in)
14 LIST_APPEND              2
16 JUMP_ABSOLUTE            4
18 RETURN_VALUE

and for the generator expression:

 0 LOAD_FAST                0 (.0)
 2 FOR_ITER                14 (to 18)
 4 STORE_FAST               1 (ch)
 6 LOAD_FAST                1 (ch)
 8 LOAD_GLOBAL              0 (A)
10 COMPARE_OP               6 (in)
12 YIELD_VALUE
14 POP_TOP
16 JUMP_ABSOLUTE            2
18 LOAD_CONST               0 (None)
20 RETURN_VALUE

The disassembly for the actual summation is:

 0 LOAD_GLOBAL              0 (sum)
 2 LOAD_CONST               1 (<code object <genexpr> at 0x7f49dc395240, file "/home/mishac/dev/python/kintsugi/KintsugiModels/automated_tests/a.py", line 12>)
 4 LOAD_CONST               2 ('generation_expression.<locals>.<genexpr>')
 6 MAKE_FUNCTION            0
 8 LOAD_GLOBAL              1 (B)
10 GET_ITER
12 CALL_FUNCTION            1
14 CALL_FUNCTION            1
16 RETURN_VALUE

but this sum disassembly was constant between both your examples, with the only difference being the loading of generation_expression.<locals>.<genexpr> vs list_comprehension.<locals>.<listcomp> (so just loading a different local variable).

The differing bytecode instructions between the first two disassemblies are LIST_APPEND for the list comprehension vs. the conjunction of YIELD_VALUE and POP_TOP for the generator expression.

I won't pretend I know the intrinsics of Python bytecode, but what I gather from this is that the generator expression is implemented as a queue, where the value is generated and then popped. This popping doesn't have to happen in a list comprehension, leading me to believe there'll be a slight amount of overhead in using generators.

Now this doesn't mean that generators are always going to be slower. Generators excel at being memory-efficient, so there will be a threshold N such that list comprehensions will perform slightly better before this threshold (because memory use won't be a problem), but after this threshold, generators will significantly perform better.

Can you please show the code you used to get the bytecode instructions? I tried as well but got different instructions. — jkr, Jul 19 '20 at 02:05
Also, `sum` is written in C and uses the list iterator's `tp_next` function which doesn't have to go through the yield/pop op codes for each value. I think the extra overhead is in YIELD. — tdelaney, Jul 19 '20 at 02:26
Does the generator produce something that is then summed, the way the list comprehension produces a list of bools? — Chris94549, Jul 19 '20 at 02:36

score 1 · Answer 2 · answered Jul 19 '20 at 01:56

1

Generators are usually slower than list comprehension, the whole point of generators is to be memory efficient because they yield each item by creating them in a lazy fashion (only when actually needed). They favor memory efficency over speed.

answered Jul 19 '20 at 01:56

Andreas

8,694
3
14
38

I'm trying to understand what that means in this context. The sum(list comprehension) first produces a list of bool, and then sums the 1's. What is the generator expression doing differently? – Chris94549 Jul 19 '20 at 02:01
1

In my understanding of generators they require multiple next calls which can (but not always) makes them slower, see here: The generator uses next calls as u can see here: https://www.python.org/dev/peps/pep-0289/#the-details So longer iterables like strings lead to more next calls which slows down. – Andreas Jul 19 '20 at 02:07
1

I found a SO answer explaining it in more details: https://stackoverflow.com/questions/11964130/list-comprehension-vs-generator-expressions-weird-timeit-results – Andreas Jul 19 '20 at 02:13

Why is summing list comprehension faster than generator expression?

2 Answers2

Linked