Why are python's for loops so non-linear for large inputs?

Question

I was benchmarking some python code I noticed something strange. I used the following function to measure how fast it took to iterate through an empty for loop:

def f(n):
    t1 = time.time()
    for i in range(n):
        pass
    print(time.time() - t1)

f(10**6) prints about 0.035, f(10**7) about 0.35, f(10**8) about 3.5, and f(10**9) about 35. But f(10**10)? Well over 2000. That's certainly unexpected. Why would it take over 60 times as long to iterate through 10 times as many elements? What's with python's for loops that causes this? Is this python-specific, or does this occur in a lot of languages?

you are creating a list if you are using python2, your timing would also be better using the timeit module — Padraic Cunningham, Feb 22 '15 at 19:23
from `10**6` to `10**9` there is a huge difference. _Exponential growth_ — levi, Feb 22 '15 at 19:23
Then I think it's caused by the fact that 10^9 fits in a standard 32-bits integer while for 10^10 python uses arbitrary size integers. This makes every comparison slower. — fvannee, Feb 22 '15 at 19:27
Maybe python allocates memory for the list. For too many items, The OS is paging out to disk space. — Clemens Frahnow, Feb 22 '15 at 19:27
@clemos24 Not in Py3, `range` returns a generator, not a list. — jeromej, Feb 22 '15 at 19:30
@user3002473, what exact version of python are you using because I have timed the code and get nothing even remotely close to the difference in your timings? — Padraic Cunningham, Feb 22 '15 at 21:09
As you can see I ran the tests using the timeit module on my own notebook and they are quite a bit different to yours. I also used python3.4. It would be interesting to try and run the tests using timeit on your machine. — Padraic Cunningham, Feb 22 '15 at 21:12

score 20 · Accepted Answer · answered Feb 22 '15 at 19:26

20

When you get above 10^9 you get out of 32bit integer range. Python3 then transparently moves you onto arbitrary precision integers, which are much slower to allocate and use.

In general working with such big numbers is one of the areas where Python3 is a lot slower that Python2 (which at least had fast 64bit integers on many systems). On the good side it makes python easier to use, with fewer overflow type errors.

answered Feb 22 '15 at 19:26

Thomas Ahle

30,774
21
92
114

2

It started with Pep237: https://www.python.org/dev/peps/pep-0237/ which made longs arbitrary precision. Later Python3 got rid of the int type which slowed things down. During the 3.x releases I believe they have been working to introduce more optimizations for smaller integers. – Thomas Ahle Feb 22 '15 at 21:33
1

The actual difference in run time seems to be completely platform specific . – Padraic Cunningham Feb 23 '15 at 13:24
@PadraicCunningham I've been trying to read the source code for the long implementation, to see exactly what optimisations are made when. There is certainly the choice of whether a digit should be 15 or 30 bits, but after that it became a lot of work deciphering it all. – Thomas Ahle Feb 23 '15 at 14:01

score 5 · Answer 2 · edited May 23 '17 at 11:56

5

Some accurate timings using timeit show the times actually roughly increase in line with the input size so your timings seem to be quite a ways off:

In [2]: for n in [10**6,10**7,10**8,10**9,10**10]:
               % timeit f(n)
   ...:     
10 loops, best of 3: 22.8 ms per loop
1 loops, best of 3: 226 ms per loop # roughly ten times previous
1 loops, best of 3: 2.26 s per loop # roughly ten times previous
1 loops, best of 3: 23.3 s per loop # roughly ten times previous
1 loops, best of 3: 4min 18s per loop # roughly ten times previous

Using xrange and python2 we see the ratio roughly the same, obviously python2 is much faster overall due to the fact python3 int has been replaced by long:

In [5]: for n in [10**6,10**7,10**8,10**9,10**10]:
               % timeit f(n)
   ...:     
100 loops, best of 3: 11.3 ms per loop
10 loops, best of 3: 113 ms per loop
1 loops, best of 3: 1.13 s per loop
1 loops, best of 3: 11.4 s per loop
1 loops, best of 3: 1min 56s per loop

The actual difference in run time seems to be more related to the size of window's long rather than directly related to python 3. The difference is marginal when using unix which handles longs much differently to windows so this is a platform specific issue as much if not more than a python one.

edited May 23 '17 at 11:56

Community

1
1

answered Feb 22 '15 at 19:50

Padraic Cunningham

176,452
29
245
321

I just ran another test, and I got the following results: `10**6` - 0.07520970538367432 seconds, `10**7` - 0.36399144430744984 seconds, `10**8` - 3.492611703306957 seconds, `10**9` - 34.71252936832384 seconds, `10**10` - 2190.0514266665177 seconds. Could it be a difference in what system we're running it on? – user3002473 Feb 23 '15 at 00:07
@user3002473. What are you running it on? – Padraic Cunningham Feb 23 '15 at 00:12
@PadriacCunningham Windows 7 64-bit, Intel Core i5 3.20 GHz processor with 4.00 GB of RAM installed. – user3002473 Feb 23 '15 at 00:18
@user3002473. I am on Ubuntu 14.04 with similar specs. I cannot think of any obvious reason it is running so slow. Have you tried it using python 2 with xrange ? – Padraic Cunningham Feb 23 '15 at 00:23
@PadriacCunningham I just tried running the same test in python 2.7.2 with `xrange`, and it's giving me an `OverflowError` for `xrange(10**10)`, but up until then the times were `0.0447` seconds, `0.168` seconds, `1.65` seconds, and `15.6` seconds. – user3002473 Feb 23 '15 at 00:35
1

@user3002473, you are bound by the size of the windows long, http://stackoverflow.com/questions/384502/what-is-the-bit-size-of-long-on-64-bit-windows so that is why you get the `OverflowError`. It may be a factor as to why you get such poor performance in comparison to mu linux machine. I will do a bit more digging around tomorrow as the whole integer size explanation does not explain the whole story. If it were python specific I should have a similar performance hit – Padraic Cunningham Feb 23 '15 at 00:47

Why are python's for loops so non-linear for large inputs?

2 Answers2

Linked