2

I often run into this situation while coding in Python, and am not sure which is more performant. Suppose I have a list l = [3, 13, 6, 8, 9, 53], and I want to use a list comprehension to create a new list that subtracts off the minimum value so that the lowest number is zero. I could do:

[x - min(l) for x in l]

On the other hand, I could do:

min_val = min(l)
[x - min_val for x in l]

Is it true that the first option causes min(l) to run for every new item in the list, while the second option only calculates the minimum value once? Potentially, if l is a very long list, this could be a significant performance difference? On the other hand, perhaps with a shorter list, the creation of a variable in option 2 results in some overhead?

I guess my question is: how much of a difference does this make? I find the first option cleaner and more compact, but don't know if that comes at a performance cost.

Relatedly, is there any performance difference between:

new_list = []
for x in l:
    new_list.append(x ** 2)

and:

new_list = [x ** 2 for x in l]

Does the latter directly translate into the former, or is there a difference in what goes on under the hood?

MarcTheSpark
  • 473
  • 5
  • 14
  • Replace `min` by a user-defined function to see if it's called for every item. – jarmod Feb 01 '22 at 21:06
  • 4
    There is this really cool module called timeit ... with wich you can ... timeit. No reason to ask, just try it. – Patrick Artner Feb 01 '22 at 21:11
  • @IlaiK you're suggesting that Python can optimize the code because it knows that the result of min(l) can be cached after first execution. Do you have anything to support this? – jarmod Feb 01 '22 at 21:21
  • 1
    @IlaiK this is **absolutely not true for CPython**. – juanpa.arrivillaga Feb 02 '22 at 01:34
  • Also, that for-loop is essentailly exactly equivalent to your list comprehension. But note, the comprehension will be marginally (not algorithmically) faster. It is still using `.append` in a loop, but it caches the method resolution, which you can do by using `list_append = new_list.append` and using `list_append(x ** 2)` in the body of the loop. Additionally, everything is executed in a local scope, which will be faster for the name resolution, and I think there is a minor bytecode trick. – juanpa.arrivillaga Feb 02 '22 at 01:37
  • But despite those diffeerences, the use-case for list comprehensions is not *improving performance*, but *improving readability*. They are a nice, declarative way of doing mapping/filtering operations on iterables to create a new list. – juanpa.arrivillaga Feb 02 '22 at 01:37

5 Answers5

2

I prefer the timeit calls from the commandline in this case as this problem isn't too complicated:

$ python -m timeit --setup "x = [i for i in range(1000)]" "[i - min(x) for i in x]"
50 loops, best of 5: 8.07 msec per loop
$ python -m timeit --setup "x = [i for i in range(1000)]" "y = min(x);[i - y for i in x]"
10000 loops, best of 5: 36.8 usec per loop

Where you can see very clearly that there is a tremendous benefit to storing the minimum value before the list comprehension.

Kraigolas
  • 5,121
  • 3
  • 12
  • 37
1

It has a very large impact, you can run this quick test to see for yourself:

from datetime import datetime as dt
now = dt.now
l = []

#create list
for i in range(20000):
    l.append(i)

#mark time
print(str(now().time()))

#save min first
min_val = min(l)
new = [x - min_val for x in l]

#mark time
print(str(now().time()))

#do min() each time
[x - min(l) for x in l]

#mark time
print(str(now().time()))

Output:

15:17:38.392065
15:17:38.393058
15:17:40.592543

The first ran instantly and the second took over two seconds and that is only for 20,000 items.

So practically a large difference even though it would be expected due to the change in time complexity. (O(n) to O(n^2))

Eli Harold
  • 2,280
  • 1
  • 3
  • 22
1

Yes, min(l) will be called for every iteration in the list comprehension. You can test this using your own function, as @jarmod has mentioned.

Yes, this will add complexity compared to storing it in a variable, because min(l) is O(n). That makes your code with the method call inside comprehension O(n^2) vs just O(n) if you store it in a variable beforehand.

And yes, the other example (list comprehension vs append) is equivalent, as shown in the docs.

shriakhilc
  • 2,922
  • 2
  • 12
  • 17
1

There is this really cool module called timeit ... with wich you can ... timeit. Don't ask, measure:

import timeit


def a():
    l = list(range(100))[::-1]
    return [x - min(l) for x in l]

def b():
    l = list(range(100))[::-1]
    min_val = min(l)
    return [x - min_val for x in l]



# do 100 calls to the given function, average results to avoid
# caching mishaps
print("min inside list comp: ", timeit.timeit(a, number=100))
print("min outside list comp:", timeit.timeit(b, number=100))

Output:

min inside list comp:  0.008259299998826464
min outside list comp: 0.0010576999993645586

See How to use timeit module

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • Fair. :-) I don't know why it didn't occur to me to measure it myself (or to use a custom function to check as others suggested). Anyway, thank you for testing it! – MarcTheSpark Feb 02 '22 at 00:23
0
import time
from functools import lru_cache

l = [i for i in range(10000)]
min_val = min(l)

def f1():
    min_l = [x - min(l) for x in l]


def f2():
    min_l = [x - min_val for x in l]


@lru_cache
def lru_min(x):
    return min(x)

if __name__ == '__main__':
    start_time = time.time()
    f1()
    print("--- f1 runs in %s seconds ---" % (time.time() - start_time))
    start_time = time.time()
    f2()
    print("--- f2 runs in %s seconds ---" % (time.time() - start_time))

This code results in

--- f1 runs in 0.7290558815002441 seconds ---
--- f2 runs in 0.0003840923309326172 seconds ---
sudden_appearance
  • 1,968
  • 1
  • 4
  • 15