Python FOR loop vs. builtin reduce() and mul() functions?

Question

Here I have two functions:

from functools import reduce
from operator import mul

def fn1(arr):
    product = 1
    for number in arr:
        product *= number
    return [product//x for x in arr]

def fn2(arr):
   product = reduce(mul, arr)
   return [product//x for x in arr]

Benchmark:

In [2]: arr = list(range(1,11))

In [3]: %timeit fn1(arr)
1.62 µs ± 23.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit fn2(arr)
1.88 µs ± 28.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: arr = list(range(1,101))

In [6]: %timeit fn1(arr)
38.5 µs ± 190 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [7]: %timeit fn2(arr)
41 µs ± 463 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: arr = list(range(1,1001))

In [9]: %timeit fn1(arr)
4.23 ms ± 25.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [10]: %timeit fn2(arr)
4.24 ms ± 36.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [11]: arr = list(range(1,10001))

In [12]: %timeit fn1(arr)
605 ms ± 4.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [13]: %timeit fn2(arr)
594 ms ± 4.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Here fn2() is marginally slower with the small lists. My understanding was that reduce() and mul() functions are both builtin functions, therefore they run at C speed and should be faster than the for loop. Probably because I have much more function calls (which also take some time) inside the fn2, it contributes to the end performance? But then the trend shows that fn2() outperforms fn1() with the larger lists. Why?

Those are all very close time measures, I don't think there really is a significant difference between one or another method. `reduce` is there just in case you prefer programming in a more functional style, it's not supposed to be an optimization. Also, you have a list comprehension in the functions that is probably taking most of the time anyway, especially considering you are dividing numbers with literally thousands of digits (10,000! takes over 100,000 bits to store). — jdehesa, May 03 '19 at 11:26
Speculating that these tiny differences could be caused by the extra calls - a for-loop and arithmetic ops might just be a bit faster than the overhead of calling `mul`. (It seems that the standard CPython interpreter doesn't do inlining. https://stackoverflow.com/questions/6442050/python-equivalence-to-inline-functions-or-macros, note how PyPy gets a special mention for its inlining support). You shouldn't worry too much about this; if performance on that scale matters, Python is probably the wrong tool to begin with. (Though you could try PyPy.) — Christoph Burschka, May 03 '19 at 11:52
Yes, but why the trend is what it is? If `for` loop + arithmetic ops (`fn1`) take less time to compute than the `reduce()` + `mul()` (`fn2`), then it should remain true regardless the size of the list, right? Furthermore with more items in the list computational distance between `fn1` and `fn2` should increase, as with more function calls it would require even more time. But I'm observing the opposite, it seems that with more items in the list `fn2` magically starts performing better than `fn1`. — NarūnasK, May 03 '19 at 14:14

varela · Answer 1 · 2019-05-03T11:34:07.873

There could be a lot of reasons for this. First is CPU code execution predictions and compiler optimizations. As for me it shouldn't matter much for you which form of the code is faster if you choose proper algorithm. You need to use one that is suitable for your needs and looks better and leave performance to python compiler. Often faster options cause more problems with memory/readability/support. Also it's not guaranteed that little more complex code inside simple cycles wouldn't change performance, just because some optimizations could be applied.

-- Update --

If you want to increase performance of python on simple operations I would advice you to look at PyPy, Cython, nim which are made to gain performance of C when python type wrappers take too much.

score 0 · Answer 2 · answered May 03 '19 at 15:07

these are always going to be very close, but for somewhat interesting reasons:

if the product is small (e.g. for the range(1,10)) then the functions are doing very little "useful" work, and everything is going into marshalling machine ints between Python objects so the functions can be invoked
if the product is large (e.g. the range(1,10001) case) the numbers become enormous (i.e. thousands of decimal digits) and the majority of time is spent multiplying very large numbers

for example, with Python 3.7.3 (in Linux 5.0.10):

from functools import reduce
from operator import mul

prod = reduce(mul, range(1, 10001))

prod has ~36k digits and consumes ~16KiB --- i.e. check with math.log10(prod) and sys.getsizeof(prod).

working with small (i.e. not bignum) products such as:

reduce(mul, [1] * 10001)

is ~50 times faster on my computer than when we need to use bignums as above. also notice that it's basically the same speed when using floating point numbers as compared to integers, e.g.

reduce(mul, [1.] * 10001)

only takes ~10% more time.

your additional code that makes an additional pass over the array seems to just be complicating issues hence I've ignored it --- microbenchmarks like these are awkward enough to get right!

Python FOR loop vs. builtin reduce() and mul() functions?

2 Answers2