3

I am going trough the docs of Python 3.X, I have doubt about List Comprehension speed of execution and how it exactly work.

Let's take the following example:

Listing 1

...
L = range(0,10)
L = [x ** 2 for x in L]
...

Now in my knowledge this return a new listing, and it's equivalent to write down:

Listing 2

...
res = []
for x in L:
  res.append(x ** 2)
...

The main difference is the speed of execution if I am correct. Listing 1 is supposed to be performed at C language speed inside the interpreter, meanwhile Listing 2 is not.

But Listing 2 is what the list comprehension does internally (not sure), so why Listing 1 is executed at C Speed inside the interpreter & Listing 2 is not? Both are converted to byte code before being processed, or am I missing something?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Sid
  • 14,176
  • 7
  • 40
  • 48
  • 2
    _both are converted to byte code before being processed_ yeah, sure, so what? They are not converted to same byte code, so why they should take same amount of time? – Łukasz Rogalski Aug 14 '16 at 11:20
  • 1
    @ŁukaszRogalski isn't listing 2 what list comprehension does internally? If so the should be converted to the same byte code I guess – Sid Aug 14 '16 at 11:23
  • 1
    The main purpose of list comprehensions is to simplify complex list creation and improve readability to a one-liner. You could potentially compare if there is a difference in the compiled bytecode `pyc`. The Python source code is open, so if you read C, you can read through the algorithms corresponding to list comprehension. – noumenal Aug 14 '16 at 11:25
  • 4
    You may use `dis` module to analyze byte codes. And no, second code snippet does much more. All in-between states have to available to interpreter on each iteration (even if they are not used). `res.append` invokes a function call (and they are not that cheap in Python). List comprehensions allows you for much less that explicit loop, but due to this limitations it allows for much efficient implementation. – Łukasz Rogalski Aug 14 '16 at 11:26
  • I'll check with that module and also have a look at C implementation, thanks guys! – Sid Aug 14 '16 at 11:30
  • http://blog.cdleary.com/2010/04/efficiency-of-list-comprehensions/ – Padraic Cunningham Aug 14 '16 at 11:56

2 Answers2

4

Look at the actual bytecode that is produced. I've put the two fragments of code into fuctions called f1 and f2.

The comprehension does this:

  3          15 LOAD_CONST               3 (<code object <listcomp> at 0x7fbf6c1b59c0, file "<stdin>", line 3>)
             18 LOAD_CONST               4 ('f1.<locals>.<listcomp>')
             21 MAKE_FUNCTION            0
             24 LOAD_FAST                0 (L)
             27 GET_ITER
             28 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             31 STORE_FAST               0 (L)

Notice there is no loop in the bytecode. The loop happens in C.

Now the for loop does this:

  4          21 SETUP_LOOP              31 (to 55)
             24 LOAD_FAST                0 (L)
             27 GET_ITER
        >>   28 FOR_ITER                23 (to 54)
             31 STORE_FAST               2 (x)
             34 LOAD_FAST                1 (res)
             37 LOAD_ATTR                1 (append)
             40 LOAD_FAST                2 (x)
             43 LOAD_CONST               3 (2)
             46 BINARY_POWER
             47 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             50 POP_TOP
             51 JUMP_ABSOLUTE           28
        >>   54 POP_BLOCK

In contrast to the comprehension, the loop is clearly here in the bytecode. So the loop occurs in python.

The bytecodes are different, and the first should be faster.

James K
  • 3,692
  • 1
  • 28
  • 36
  • Yes that's it! I've just checked the byte code as suggested by you & Lukasz and got the same result. Actually yes the two listing are the same, and list comprehension does the same thing as L2 but in C therefore is obviously faster, roughly twice. Thanks once again! – Sid Aug 14 '16 at 12:02
  • 1
    @Sid there *is* a loop in the bytecode, it is in `3 ( at 0x7fbf6c1b59c0, file "", line 3>)`. You can see this by introspecting the code object – juanpa.arrivillaga Apr 02 '19 at 17:28
  • 1
    e.g. `dis.dis(f1.__code__.co_consts[3])` (or it will automatically recurse in Python 3.7). The speed advantage comes from a special opcode for the `.append` operation, but you could get almost the same speed-up by "caching" the method resolution, `l_append = L.append` – juanpa.arrivillaga Apr 02 '19 at 17:35
3

The answer is actually in your question.

When you run any built in python function you are running something that has been written in C and compiled into machine code.

When you write your own version of it, that code must be converted into CPython objects which are handled by the interpreter.

In consequence the built-in approach or function is always quicker (or takes less space) in Python than writing your own function.

ntrch
  • 76
  • 1
  • 22
Sam Redway
  • 7,605
  • 2
  • 27
  • 41