Python loop faster than One-line-comprehension: Why?

Question

Here are two functions doing exactly the same (adding up the factorials of a number's digits). Array F contains 0!...9!

N = 1_000_000

F = [1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880]

def facdigits1(n):
    total = 0
    for i in map(int, str(n)):
        total += F[i]
    return total

def facdigits2(n):
    return sum((F[i] for i in map(int, str(n))))

total = 0
for i in range(2, N):
    n = facdigits1(i)
    total += n
print(total)

Totally unexpected (for me, that is), facdigits2 is about 20% slower than facdigits1. I expected the opposite. Any idea what's going on here?

Inside loop for the first 1_000_000 numbers. Same relation, by the way, on my Mac and on a Raspberry 4. Thanks for asking! — Peter Leikauf, Jan 18 '20 at 22:04
I tried just:the following. I suspect generator objects require some sort of internal function call to get the next value >>> (F[i] for i in map(int, str(100))) at 0x00000000021E2828> — user2740650, Jan 18 '20 at 22:09
Sure -- I'm saying that's probably why it's slower than a loop. — user2740650, Jan 18 '20 at 22:10
@user2740650 It really depends - sometimes, loading a list into memory is faster, but other times, the memory savings of a generator can affect the speed quite a bit. — iz_, Jan 18 '20 at 22:12
Can you add your full testing code as well as your definition for `F`? — alkasm, Jan 18 '20 at 22:16
Here it is. And for the sake of completeness: Using a list comprehension instead of a generator is somewhat better, but still worse than the loop :-( — Peter Leikauf, Jan 18 '20 at 22:19
Edit the post and add it, not in a comment :). BTW, one of the sets of parentheses in your call to `sum()` in `facdigits2` is superfluous, but it shouldn't affect the speed. — alkasm, Jan 18 '20 at 22:21
`return sum(map(F.__getitem__, map(int, str(n))))` is a bit faster on my machine, FWIW. — Izaak van Dongen, Jan 18 '20 at 22:40
@IzaakvanDongen if the question was to optimize, then having `F` be a dictionary mapping over string indices, then you wouldn't have to `map(int, str(n))` and could just iterate over the string and would thus be even faster. But the question isn't asking the fastest way :) — alkasm, Jan 18 '20 at 22:43
@alkasm I'd guess that it would be significantly faster (for big `n`, at least) to compute the base-10 representation of `n` as integer digits directly, without using `str`. Anyways I guess I left that comment because it still feels in the spirit of "why isn't there a fast weird python one-liner?" — Izaak van Dongen, Jan 18 '20 at 22:45

Kelly Bundy · Answer 1 · 2020-01-18T23:22:48.333

Setup overhead seems to be higher. Here are some times in seconds for 2**25 / n times summing [1] * n with three methods (so for every n and thus every row, the total number of summed numbers is the same, namely 2**25). Note how all three start slow and loop starts fastest. No idea why loop, after going below 2.0, goes up to 2.8 (edit: Maybe Python's small integers cache makes sums below 257 faster, but the other two solutions don't seem to suffer from that).

       n   loop   generator  pure
-----------------------------------
       1  11.3239  24.6003  12.5587
       2   6.8343  13.6080   5.6892
       4   4.4053   7.9283   3.0991
       8   3.2007   5.5467   1.8115
      16   2.5485   3.7965   1.1166
      32   2.2361   3.0682   0.7247
      64   2.0101   3.0527   0.5513
     128   2.0220   2.4149   0.4475
     256   1.8081   2.3848   0.3966
     512   2.0703   2.3418   0.4229
    1024   2.2276   2.2966   0.3706
    2048   2.2751   2.2856   0.3614
    4096   2.2761   2.3269   0.3533
    8192   2.3085   2.3623   0.3671
   16384   2.3395   2.2970   0.3552
   32768   2.4804   2.3583   0.4097
   65536   2.5998   2.3151   0.3572
  131072   2.7952   2.2654   0.3954
  262144   2.7445   2.4830   0.3599
  524288   2.7740   2.3462   0.3565
 1048576   2.7449   2.2845   0.3592
 2097152   2.8187   2.2246   0.3708
 4194304   2.8307   2.2691   0.3572
 8388608   2.8036   2.2731   0.3690
16777216   2.7704   2.3388   0.3649
33554432   2.8328   2.3018   0.3545

You have n ≤ 6, so my test agrees that generator is slower there. In my case it's more extreme, presumably because I'm not doing anything else like converting numbers to strings and looking up list indexes and summing large numbers.

Code:

from timeit import timeit

def sum_loop(numbers):
    total = 0
    for number in numbers:
        total += number
    return total

def sum_generator(numbers):
    return sum(number for number in numbers)

def sum_pure(numbers):
    return sum(numbers)

emax = 25
for e in range(emax + 1):
    n = 2**e
    number = 2**(emax - e)
    numbers = [1] * n
    print('%8d' % n, end='')
    for f in sum_loop, sum_generator, sum_pure:
        t = timeit(lambda: f(numbers), number=number)
        print('  %7.4f' % t, end='')
    print()

mathfux · Answer 2 · 2020-01-18T23:00:52.973

1

The problem itself is hidden in a little bit slower performance of sum function over generators. Scrooling some answers I went to conclusion that generators do not store all elements in a memory in one go. Here are my ways to define summation:

def s1(size):
    total = 0
    for i in range(size):
        total += i
    return total

def s2(size):
    return sum(i for i in range(size))

def s3(size):
    return sum(range(size))

def s4(size):
    def gen(size):
        for i in range(size):
            yield i
    return sum(gen(size))

def s5(size):
    def gen(size):
        i=0
        while i<size:
            yield i
            i+=1
    return sum(gen(size))

from time import time
import numpy as np

time_avg=[]
size = 10000000
for i in range(10):
    times=[]
    for s in [s1,s2,s3,s4,s5]:
        t = time()
        s(size)
        times.append(round(time()-t, 3))
    time_avg.append(times)

print(np.sum(np.array(time_avg), axis=0)/len(time_avg))

Output

Times for s1, s2, s3, s4, s5 are:

[0.9624 1.3263 0.6373 1.2426 1.8455]

edited Jan 18 '20 at 23:00

answered Jan 18 '20 at 22:35

mathfux

5,759
1
14
34

Pretty disappointing, though. I mean, by comparison s2 would be considered much more pythonesque than s1, right? As an amateur, I like the thought that in s2 the summation is done in native code instead of a python loop - and it is slower? Come on! – Peter Leikauf Jan 18 '20 at 22:44
Dunno how you got your timings. With %timeit, I get: S1 S2 S4: 60 us, and S3 : 20 us On a 1000 elements list. You should try to vary N. – Mathieu Jan 18 '20 at 22:45
I'm not used to %timeit, I just measure a time once using time.time().I see my results are not quite right. I'm going to update, taking average of more measurements. – mathfux Jan 18 '20 at 22:51
@mathfux Measuring just once won't give you accurate resuts. timeit gives you mean and std on multiple runs (10 000 to 1 000 000's depending on the function to time). – Mathieu Jan 18 '20 at 23:24

Python loop faster than One-line-comprehension: Why?

2 Answers2

Output