A better way for a Python 'for' loop

Question

We all know that the common way of executing a statement a certain number of times in Python is to use a for loop.

The general way of doing this is,

# I am assuming iterated list is redundant.
# Just the number of execution matters.
for _ in range(count):
    pass

I believe nobody will argue that the code above is the common implementation, however there is another option. Using the speed of Python list creation by multiplying references.

# Uncommon way.
for _ in [0] * count:
    pass

There is also the old while way.

i = 0
while i < count:
    i += 1

I tested the execution times of these approaches. Here is the code.

import timeit

repeat = 10
total = 10

setup = """
count = 100000
"""

test1 = """
for _ in range(count):
    pass
"""

test2 = """
for _ in [0] * count:
    pass
"""

test3 = """
i = 0
while i < count:
    i += 1
"""

print(min(timeit.Timer(test1, setup=setup).repeat(repeat, total)))
print(min(timeit.Timer(test2, setup=setup).repeat(repeat, total)))
print(min(timeit.Timer(test3, setup=setup).repeat(repeat, total)))

# Results
0.02238852552017738
0.011760978361696095
0.06971727824807639

I would not initiate the subject if there was a small difference, however it can be seen that the difference of speed is 100%. Why does not Python encourage such usage if the second method is much more efficient? Is there a better way?

The test is done with Windows 10 and Python 3.6.

Following @Tim Peters' suggestion,

.
.
.
test4 = """
for _ in itertools.repeat(None, count):
    pass
"""
print(min(timeit.Timer(test1, setup=setup).repeat(repeat, total)))
print(min(timeit.Timer(test2, setup=setup).repeat(repeat, total)))
print(min(timeit.Timer(test3, setup=setup).repeat(repeat, total)))
print(min(timeit.Timer(test4, setup=setup).repeat(repeat, total)))

# Gives
0.02306803115612352
0.013021619340942758
0.06400113461638746
0.008105080015739174

Which offers a much better way, and this pretty much answers my question.

Why is this faster than range, since both are generators. Is it because the value never changes?

A major problem with the second method is that it allocates storage for the entire throw-away list. — Tom Karzes, Oct 29 '17 at 02:34
But in practical code the body of the loop will be more complex, and dominate the over all timing. If the iteration variable is unimportant you are just spinning wheels. — hpaulj, Oct 29 '17 at 07:17
@hpaulj That is most of the time, however sometimes, this behavior is required, and can save a lot of time if executed in a crucial part of the code. — Max Paython, Oct 29 '17 at 09:28
@MaxPythone Could you give an example? I find it hard to believe :) (not that I'm an expert though :) ) — Ant, Oct 29 '17 at 13:14
@Ant, well Imagine you are recording a video, the way to obtain a certain seconds of video frames is to apply a loop that executes FPS * seconds times. The microsecond difference here may not affect anything in this application, but for another example, when solving algorithm questions in such as hackerrank, there are cases that small change in codes can make a huge difference in running speed, and there are many different questions in such websites, that some questions are better solved this way. But of course I am not saying you can't solve these in any other way :) — Max Paython, Oct 29 '17 at 15:37
@MaxPythone Thank for you answer, but I am still confused. For the fist example, I would think that what dominates the execution is obtaining / saving / checking the video frames you're trying to obtain. Or if you can start directly from a certain frame, no point in running an empty loop. Also for the other example; whatever algorithm you use to benchmark, it will be dominated by what you do in the loop, and not by how you increase the 32 bit counter in the for loop — Ant, Oct 29 '17 at 15:42
@Ant Thank you for asking. If you run this code in a nested loop, and if the inner loop is executed millions of times, required such behavior, the difference could be useful to the programmer. — Max Paython, Oct 29 '17 at 15:49
@MaxPythone Assuming you mean coding meaningless loops for use in timing code to ensure frames are delivered with minimal jitter, I would presume the real, useful solution to that would consist in using interrupts, timeouts, &c. - not spinning a manual loop and hoping you get it right - and that the optimiser doesn't kill it altogether. And even then, if you could find some valid reason to do the latter - which, in this case, I doubt - you would use a close-to-the-metal language like C/C++, or even assembly, rather than trying to bend a high-level language like Python to be something it's not. — underscore_d, Oct 29 '17 at 20:31
@underscore_d What you say is correct, but my question was born of curiosity rather than for direct application usage. — Max Paython, Oct 29 '17 at 20:55
Surely Python is all about readability and hence programmer efficiency, not runtime speed? — Mark Lawrence, Jan 04 '18 at 08:11
You may also use `[None] * number` instead of `[0] * count`, which is also pretty fast. Raymond Hettinger pointed out that Guido himself used this technique on `timeit` module as a fallback for when `itertools` is not available. Source: https://stackoverflow.com/a/9098860/1323778 — Fabiano, Mar 07 '19 at 06:12

score 92 · Accepted Answer · edited Nov 21 '17 at 22:32

92

Using

for _ in itertools.repeat(None, count)
    do something

is the non-obvious way of getting the best of all worlds: tiny constant space requirement, and no new objects created per iteration. Under the covers, the C code for repeat uses a native C integer type (not a Python integer object!) to keep track of the count remaining.

For that reason, the count needs to fit in the platform C ssize_t type, which is generally at most 2**31 - 1 on a 32-bit box, and here on a 64-bit box:

>>> itertools.repeat(None, 2**63)
Traceback (most recent call last):
    ...
OverflowError: Python int too large to convert to C ssize_t

>>> itertools.repeat(None, 2**63-1)
repeat(None, 9223372036854775807)

Which is plenty big for my loops ;-)

edited Nov 21 '17 at 22:32

Peter Mortensen

30,738
21
105
131

answered Oct 29 '17 at 02:50

Tim Peters

67,464
13
126
132

Thanks again, if I were to search the source code of these implementations, where can I find them (this and similar standard library functions) ? – Max Paython Oct 29 '17 at 02:54
6

That's quite a learning curve! The source for itertools is at https://github.com/python/cpython/blob/master/Modules/itertoolsmodule.c, and the implementation of `repeat` spans several distinct functions near `repeat_new`. How do I know this? Because I've played with Python's source code for 25 years ;-) – Tim Peters Oct 29 '17 at 02:58
1

@Paddy3118, it's a mixed bag, really. The _meaning_ of `repeat(None, 100)` is quite clear: it delivers `None` 100 times. That the loop doesn't care _which_ value(s) it delivers is a bit clearer that way than via, e.g., a `range(100)` spelling. But because it's a novel (rarely seen) spelling, that hurts easy readability. Nevertheless, it's obvious the second time you see it, so not a big deal. Overall, on that alone, it's only slightly less Pythonic, to me, than `range(100)`. Python has no direct way to say _just_ "do this loop N times", so there is no "truly Pythonic" way to do it. – Tim Peters Apr 08 '19 at 15:46
1

On the other hand, that `repeat()` doesn't create any new Python int objects under the covers is an undocumented CPython implementation detail, and relying on _that_ "for speed" is as un-Pythonic as it gets. – Tim Peters Apr 08 '19 at 15:47

hyper-neutrino · Answer 2 · 2017-11-02T02:03:13.617

The first method (in Python 3) creates a range object, which can iterate through the range of values. (It's like a generator object but you can iterate through it several times.) It doesn't take up much memory because it doesn't contain the entire range of values, just a current and a maximum value, where it keeps increasing by the step size (default 1) until it hits or passes the maximum.

Compare the size of range(0, 1000) to the size of list(range(0, 1000)): Try It Online!. The former is very memory efficient; it only takes 48 bytes regardless of the size, whereas the entire list increases linearly in terms of size.

The second method, although faster, takes up that memory I was talking about in the past one. (Also, it seems that although 0 takes up 24 bytes and None takes 16, arrays of 10000 of each have the same size. Interesting. Probably because they're pointers)

Interestingly enough, [0] * 10000 is smaller than list(range(10000)) by about 10000, which kind of makes sense because in the first one, everything is the same primitive value so it can be optimized.

The third one is also nice because it doesn't require another stack value (whereas calling range requires another spot on the call stack), though since it's 6 times slower, it's not worth that.

The last one might be the fastest just because itertools is cool that way :P I think it uses some C-library optimizations, if I remember correctly.

`range` returns a [`range` object](https://docs.python.org/3/library/stdtypes.html#typesseq-range) in Python 3, not a generator. One specific quality that demonstrates this is that you can iterate over it multiple times, whereas generators are consumed (and therefore empty) once iterated over. — jpmc26, Nov 02 '17 at 01:36

Darkonaut · Answer 3 · 2019-02-19T21:56:39.747

1

This answer provides a loop construct for convenience. For additional background about looping with itertools.repeat look up Tim Peters' answer above, Alex Martelli's answer here and Raymond Hettinger's answer here.

# loop.py

"""
Faster for-looping in CPython for cases where intermediate integers
from `range(x)` are not needed.

Example Usage:
--------------

from loop import loop

for _ in loop(10000):
    do_something()

# or:

results = [calc_value() for _ in loop(10000)]
"""

from itertools import repeat
from functools import partial

loop = partial(repeat, None)

edited Feb 19 '19 at 21:56

answered Feb 19 '19 at 21:16

Darkonaut

20,186
7
54
65

Wouldnt it be even better to make loop a decorator so you can do: `@loop(10000) do_something()` – Tweakimp Apr 09 '19 at 00:32
@Tweakimp I'm not sure if I understand how you think that should work. Sure you could decorate a _specific_ function to be called 1000 times, but you would lose flexibility in calling that function and it would only work for _that_ decorated function. – Darkonaut Apr 09 '19 at 01:04
I want to call the function `do_something` to be called 10000 times and the decorator might change `do_something` to `for _ in loop(10000): do_something()` – Tweakimp Apr 09 '19 at 02:17
@Tweakimp If you always want to call it 10k times it makes certainly sense and using a decorator for that is not unusual. But that's a very specific usecase then where you extend the basic idea here of just finding a "better" for-loop. – Darkonaut Apr 09 '19 at 02:32

score 0 · Answer 4 · edited Nov 21 '17 at 22:34

0

The first two methods need to allocate memory blocks for each iteration while the third one would just make a step for each iteration.

Range is a slow function, and I use it only when I have to run small code that doesn't require speed, for example, range(0,50). I think you can't compare the three methods; they are totally different.

According to a comment below, the first case is only valid for Python 2.7, in Python 3 it works like xrange and doesn't allocate a block for each iteration. I tested it, and he is right.

edited Nov 21 '17 at 22:34

Peter Mortensen

30,738
21
105
131

answered Oct 29 '17 at 02:36

Mr. bug

366
2
11

6

Incorrect. IN Python 3, `range` produces an iterator. It is equivalent to Python 2's `xrange`. Only the second method has the memory problem. – Tom Karzes Oct 29 '17 at 02:39
@TomKarzes Still incorrect (though more correct). It produces a [`range` object](https://docs.python.org/3/library/stdtypes.html#typesseq-range). A range object is not an iterator or generator; it can be iterated over multiple times without being consumed. – jpmc26 Nov 02 '17 at 01:37

A better way for a Python 'for' loop

4 Answers4

Linked

Related