Comparing list comprehensions and explicit loops (3 array generators faster than 1 for loop)

Question

I did homework and I accidentally found a strange inconsistency in the speed of the algorithm. Here is 2 versions of code of same function bur with 1 difference: in first version i use 3 times array generator to filter some array and in second version i use 1 for loop with 3 if statements to do same filter work.

So, here is code of 1st version:

def kth_order_statistic(array, k):
    pivot = (array[0] + array[len(array) - 1]) // 2
    l = [x for x in array if x < pivot]
    m = [x for x in array if x == pivot]
    r = [x for x in array if x > pivot]
    if k <= len(l):
            return kth_order_statistic(l, k)
    elif k > len(l) + len(m):
            return kth_order_statistic(r, k - len(l) - len(m))
    else:
            return m[0]

And here code of 2nd version:

def kth_order_statistic2(array, k):
    pivot = (array[0] + array[len(array) - 1]) // 2
    l = []
    m = []
    r = []
    for x in array:
        if x < pivot:
            l.append(x)
        elif x > pivot:
            r.append(x)
        else:
            m.append(x)

    if k <= len(l):
        return kth_order_statistic2(l, k)
    elif k > len(l) + len(m):
        return kth_order_statistic2(r, k - len(l) - len(m))
    else:
        return m[0]

IPython output for 1st version:

In [4]: %%timeit
   ...: A = range(100000)
   ...: shuffle(A)
   ...: k = randint(1, len(A)-1)
   ...: order_statisctic(A, k)
   ...:
10 loops, best of 3: 120 ms per loop

And for 2nd version:

In [5]: %%timeit
   ...: A = range(100000)
   ...: shuffle(A)
   ...: k = randint(1, len(A)-1)
   ...: kth_order_statistic2(A, k)
   ...:
10 loops, best of 3: 169 ms per loop

So why first version is faster than second? I also made third version wich using filter() function instead of array generator and it was slower than second version (it got 218 ms per loop)

List comprehensions are often faster than the equivalent for loops. Extending lists (your append functions) can also be more expensive than filling a list of known size. — Sohier Dane, Sep 15 '16 at 19:33
increase your list sizes dramatically ... I think you will find that the gap is more or less constant ... you are trading space for time, but they are *roughly* equivelent (there is a little bit of overhead on the generators/iterators vs the list) — Joran Beasley, Sep 15 '16 at 19:34
@SohierDane This was my thinking. The list comprehension is explicit about what the operation is going to be that creates the list, therefore optimizations could be performed on it. Building the list element by element, Python can't make any assumptions about future calculations or memory usage. — Stewart Smith, Sep 15 '16 at 19:36
The time is likely to vary significantly for different random orderings of `A` and different values of `k`. Make sure you're doing timings for the *same* `A` and `k` in both cases. I think you'll find that there's very little difference. — Mark Dickinson, Sep 15 '16 at 19:40
It is due to usage of `.append()` within the for loop. Added an answer with the insights regarding time difference. — Moinuddin Quadri, Sep 15 '16 at 19:49
You are calling `kth_order_statistic` in `kth_order_statistic2` — Flint, Sep 15 '16 at 20:22
@Flint, thank you for the correction, i edited post. In fact, the functions have different names and I corrected them in the form of a question, so it did not affect the results — KgOfHedgehogs, Sep 15 '16 at 22:33

score 10 · Answer 1 · edited May 23 '17 at 12:26

Using simple for is faster than list comprehesion. It is almost 2 times faster. Check below results:

Using list comprehension: 58 usec

moin@moin-pc:~$ python -m timeit "[i for i in range(1000)]"
10000 loops, best of 3: 58 usec per loop

Using for loop: 37.1 usec

moin@moin-pc:~$ python -m timeit "for i in range(1000): i"
10000 loops, best of 3: 37.1 usec per loop

But in your case, for is taking more time than list comprehension not because YOUR for loop is slow. But because of .append() you are using within the code.

With append() in for loop`: 114 usec

moin@moin-pc:~$ python -m timeit "my_list = []" "for i in range(1000): my_list.append(i)"
10000 loops, best of 3: 114 usec per loop

Which clearly shows that it is .append() which is taking twice the time taken by for loop.

However, on storing the "list.append" in different variable: 69.3 usec

moin@moin-pc:~$ python -m timeit "my_list = []; append = my_list.append" "for i in range(1000): append(i)"
10000 loops, best of 3: 69.3 usec per loop

There is a huge improvement in the performance as compared to the last case in above comparisons, and result is quite comparable to that of list comprehension. That means, instead of calling my_list.append() each time, the performance can be improved by storing the reference of function in another variable i.e append_func = my_list.append and making a call using that variable append_func(i).

Which also proves, it is faster to call class's function stored in the variable as compared to directly making the function call using object of the class.

Thank You Stefan for bringing the last case in notice.

One is three seperate loops over a list of size n, and there other is a single loop. A better argument could be made by showing how expensive `append` is. — Stewart Smith, Sep 15 '16 at 19:39
@sdsmith: Yes. I should have provided the insights. Updated the answer — Moinuddin Quadri, Sep 15 '16 at 19:45
What do you get for `python -m timeit "my_list = []; append = my_list.append" "for i in range(1000): append(i)"`? — Stefan Pochmann, Sep 21 '16 at 17:04
@Moinuddin Quadri. Regarding @sdsmith and @Stefan Pochmann comments, it appears you need to update the first paragraph of your answer, *"Using simple for is faster than list comprehension"*. Your first comparison is wrong. Let's simply compare comparable things : like `python -m timeit "my_list = []; append = my_list.append" "for i in range(1000): append(i)" ` **vs** `python -m timeit "a=[i for i in range(1000)]; a"` — Flint, Dec 06 '16 at 21:01

Riccardo Petraglia · Accepted Answer · 2017-05-29T21:49:45.657

Let's define the functions we will need to answer the question and timeit them:

In [18]: def iter():
    l = [x for x in range(100) if x > 10]
   ....:

In [19]: %timeit iter()
100000 loops, best of 3: 7.92 µs per loop

In [20]: def loop():
    l = []
    for x in range(100):
        if x > 10:
            l.append(x)
   ....:

In [21]: %timeit loop()
10000 loops, best of 3: 20 µs per loop

In [22]: def loop_fast():
    l = []
    for x in range(100):
        if x > 10:
            pass
   ....:

In [23]: %timeit loop_fast()
100000 loops, best of 3: 4.69 µs per loop

we can see that the for loops without the append command is as fast as the list comprehension. In fact, if we have a look at the bytecode we can see that in the case of the list comprehension python is able to use a built-in bytecode command called LIST_APPEND instead of:

Load the list: 40 LOAD_FAST
Load the attribute: 43 LOAD_ATTRIBUTE
Call the loaded function: 49 CALL_FUNCTION
Unload the list(?): 52 POP_TOP

As you can see from the output below the previous bytecode are missing with list comprehension and with the "loop_fast" function. Comparing the timeit of the three function is clear that those are responsible for the different timing of the three methods.

In [27]: dis.dis(iter)
  2          0 BUILD_LIST             0
             3 LOAD_GLOBAL            0 (range)
             6 LOAD_CONST             1 (1)
             9 LOAD_CONST             2 (100)
            12 CALL_FUNCTION          2
            15 GET_ITER
       >>   16 FOR_ITER              24 (to 43)
            19 STORE_FAST             0 (x)
            22 LOAD_FAST              0 (x)
            25 LOAD_CONST             2 (100)
            28 COMPARE_OP             4 (>)
            31 POP_JUMP_IF_FALSE     16
            34 LOAD_FAST              0 (x)
            37 LIST_APPEND            2
            40 JUMP_ABSOLUTE         16
       >>   43 STORE_FAST             1 (l)
            46 LOAD_CONST             0 (None)
            49 RETURN_VALUE

In [28]: dis.dis(loop)
  2          0 BUILD_LIST             0
             3 STORE_FAST             0 (1)

  3          6 SETUP_LOOP            51 (to 60)
             9 LOAD_GLOBAL            0 (range)
            12 LOAD_CONST             1 (1)
            15 LOAD_CONST             2 (100)
            18 CALL_FUNCTION          2
            21 GET_ITER
       >>   22 FOR_ITER              34 (to 59)
            25 STORE_FAST             1 (x)

  4         28 LOAD_FAST              1 (x)
            31 LOAD_CONST             3 (10)
            34 COMPARE_OP             4 (>)
            37 POP_JUMP_IF_FALSE     22

  5         40 LOAD_FAST              0 (l)
            43 LOAD_ATTR              1 (append)
            46 LOAD_FAST              1 (x)
            49 CALL_FUNCTION          1
            52 POP_TOP
            53 JUMP_ABSOLUTE         22
            56 JUMP_ABSOLUTE         22
       >>   59 POP_BLOCK
       >>   60 LOAD_CONST             0 (None)
            63 RETURN_VALUE

In [29]: dis.dis(loop_fast)
  2          0 BUILD_LIST             0
             3 STORE_FAST             0 (1)

  3          6 SETUP_LOOP            38 (to 47)
             9 LOAD_GLOBAL            0 (range)
            12 LOAD_CONST             1 (1)
            15 LOAD_CONST             2 (100)
            18 CALL_FUNCTION          2
            21 GET_ITER
       >>   22 FOR_ITER              21 (to 46)
            25 STORE_FAST             1 (x)

  4         28 LOAD_FAST              1 (x)
            31 LOAD_CONST             3 (10)
            34 COMPARE_OP             4 (>)
            37 POP_JUMP_IF_FALSE     22

  5         40 JUMP_ABSOLUTE         22
            43 JUMP_ABSOLUTE         22
       >>   46 POP_BLOCK
       >>   47 LOAD_CONST             0 (None)
            50 RETURN_VALUE

Flint · Answer 3 · 2016-09-15T22:47:24.543

Let's dissipate that doubt : The second version is slightly faster : list comprehension are faster, yet two arrays looping and as much conditionals are discarded in one iteration.

def kth_order_statistic1(array,k):
    pivot = (array[0] + array[len(array) - 1]) // 2
    l = [x for x in array if x < pivot]
    m = [x for x in array if x == pivot]
    r = [x for x in array if x > pivot]

    if k <= len(l):
        return kth_order_statistic1(l, k)
    elif k > len(l) + len(m):
        return kth_order_statistic1(r, k - len(l) - len(m))
    else:
        return m[0]


def kth_order_statistic2(array,k):
    pivot = (array[0] + array[len(array) - 1]) // 2
    l = []
    m = []
    r = []
    for x in array:
        if x < pivot:
            l.append(x)
        elif x > pivot:
            r.append(x)
        else:
            m.append(x)

    if k <= len(l):
        return kth_order_statistic2(l, k)
    elif k > len(l) + len(m):
        return kth_order_statistic2(r, k - len(l) - len(m))
    else:
        return m[0]

def kth_order_statistic3(array,k):
    pivot = (array[0] + array[len(array) - 1]) // 2
    l = []
    m = []
    r = []

    for x in array: 
       if x < pivot: l.append(x)
    for x in array: 
       if x== pivot: m.append(x)
    for x in array: 
       if x > pivot: r.append(x)

    if k <= len(l):
        return kth_order_statistic3(l, k)
    elif k > len(l) + len(m):
        return kth_order_statistic3(r, k - len(l) - len(m))
    else:
        return m[0]

import time
import random
if __name__ == '__main__':

    A = range(100000)
    random.shuffle(A)
    k = random.randint(1, len(A)-1)

    start_time = time.time()
    for x in range(1000) :
        kth_order_statistic1(A,k)
    print("--- %s seconds ---" % (time.time() - start_time))

    start_time = time.time()
    for x in range(1000) :
        kth_order_statistic2(A,k)
    print("--- %s seconds ---" % (time.time() - start_time))

    start_time = time.time()
    for x in range(1000) :
        kth_order_statistic3(A,k)
    print("--- %s seconds ---" % (time.time() - start_time))

python :
--- 25.8894710541 seconds ---
--- 24.073086977 seconds ---
--- 32.9823839664 seconds ---

ipython
--- 25.7450709343 seconds ---
--- 22.7140650749 seconds ---
--- 35.2958850861 seconds ---

The timing may vary according to the random draw, but the differences between the three are pretty much the same.

You did not consider that the function modifies the array, in this context, the following function call to the same array will reduce the work function. Because of this, I measured the time with the shuffle function. But this will not affect on your results. Thanks for answer — KgOfHedgehogs, Sep 15 '16 at 21:35
What ? No, the array is not written back, it's used only as input. Besides, if you gives different arrays to the functions, you'll get biased results since the workload depends also on the random distribution. — Flint, Sep 15 '16 at 22:45

Flint · Answer 4 · 2016-09-15T21:08:06.720

The algorithmic structure differs and the conditional structure is to be incriminated. the test to append into r and m can be discarded by the previous test. A more strict comparison regarding a for loop with append, and list comprehension would be against the non-optimal following

for x in array:
        if x < pivot:
            l.append(x)
for x in array:
        if x== pivot:
            m.append(x)
for x in array:
        if x > pivot:
            r.append(x)

Comparing list comprehensions and explicit loops (3 array generators faster than 1 for loop)

4 Answers4

Linked