0

In Python3, I find that if I replace a

for i in range(n):

statement with

while i < n:

I get significant runtime gains. My loop itself is not meaty where it does couple of basic arithmetic operations.

Any pointers to why I am seeing this behavior?

Edit: n is ranging in 10s of K, 10K, 12K etc Timing I observe is .19s for 12K, .12s for 10K with while loop. Whereas with 'while' loop, i see .11s for 12K, .08s for 10K. Here is my program:

target = 0
i = 1
#for i in range(1, n+1):
while i < n+1:
    target += i * (2 ** (i - 1)) + (i * (i + 1))//2
    i += 1

return target % (10 ** 9 + 7)
badmad
  • 2,059
  • 2
  • 14
  • 14
  • 1
    How big is `n`? `range` has some fixed overhead to create, and might lose if `n` is small. – ShadowRanger Dec 24 '18 at 21:59
  • 1
    please provide a [mcve]. Generally, looping directly over a `range` iterable will be faster than manually `+=` some `i`. Depends on the scope of your problem, and what exactly you are doing with `i` I suppose – juanpa.arrivillaga Dec 24 '18 at 22:02

1 Answers1

4

range involves a small amount of fixed overhead (to look up range, first in the globals, then in the built-ins, then the cost of generic function call dispatch, and allocating/initializing the object); if n is sufficiently small, it won't be made up on the reduced per loop cost:

In [1]: %%timeit -r5 n = 3
   ...: for i in range(n):
   ...:     pass
   ...:
365 ns ± 15.1 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

In [2]: %%timeit -r5 n = 3
   ...: i = 0
   ...: while i < n:
   ...:     i += 1
   ...:
252 ns ± 16.9 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

But when n gets to even moderate size, the reduced overhead per item pays off:

In [3]: %%timeit -r5 n = 10
   ...: for i in range(n):
   ...:     pass
   ...:
461 ns ± 18.1 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

In [4]: %%timeit -r5 n = 10
   ...: i = 0
   ...: while i < n:
   ...:     i += 1
   ...:
788 ns ± 73.6 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

range involves higher fixed costs, but lower per-item costs, that's all.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • `itertools.repeat(None, n)` may beat both `range` and `while`. See https://stackoverflow.com/a/9059268/1453822 – DeepSpace Dec 24 '18 at 22:07
  • 2
    @DeepSpace: Yup. `range` (after it passes the small int cache boundaries) has to make actual objects, and has to look them up in the cache even before then; `repeat` doesn't, which saves a little time. The savings aren't completely trivial (for the `n = 10` case, the timing measurement for me is 319 ns ± 39.6 ns), but they're not as extreme as in `range` vs. manual `while` loops, and the need for an import (which itself costs over a microsecond, and therefore eats the savings from the first few uses at least) makes it less of an automatic choice (especially if `i` might be used for something). – ShadowRanger Dec 24 '18 at 22:14
  • premature optimization at its best :) – DeepSpace Dec 24 '18 at 22:15
  • @DeepSpace: I can't fault you; it may be the root of all evil, but it's also the root of all fun, at least for folks with my mentality. :-) – ShadowRanger Dec 24 '18 at 22:18
  • @ShadowRanger I edited the question to include range of n and the program itself. For 12K, 10K loops, I am observing smaller runtime for while loop. – badmad Dec 24 '18 at 22:29
  • @ShadowRanger About the fixed-cost overhead, what's idea about object lookups? what's the need for that? In essence, why is it designed like that? – badmad Dec 24 '18 at 22:31
  • 1
    @badmad: Did you leave in the `i = 1` and `i += 1` with the `for` loop version? Because then you'd be paying most of the cost of the `while` *and* the cost of the `for` which is clearly going to be worse than just one or the other. – ShadowRanger Dec 24 '18 at 22:38
  • @badmad: Python is designed to be highly dynamic, at the expense of performance. If you want to make your own `range` replacement, you can, and it just works. It's not worth reserving a bunch of keywords to avoid (typically) tiny amounts of work to do the pair of `dict` lookups involved in loading built-ins. – ShadowRanger Dec 24 '18 at 22:39
  • @ShadowRanger yes, "i = 1" and "i +=1" is not required for the "for" loop. I didn't understand your explanation about the cost being worse. Can you elaborate? – badmad Dec 26 '18 at 08:55
  • @badmad: If you leave in `i += 1` with the `for` loop, you're paying the cost of loading, incrementing, and storing `i` that only the `while` loop requires. Why do it? The `for` loop is already covering the creation of incrementally increasing `i`s, and now you're paying it twice. – ShadowRanger Dec 26 '18 at 10:30
  • @badmad: On checking your actual code in microbenchmarks (I wrapped it up in functions that take `n` as a parameter) and timing it with `ipython3` `%timeit` magic, I can't reproduce the sort of timing discrepancies you're describing. The work done in your loop is *vastly* more expensive than whatever loop structure you use; for `target_while(10000)` I get 94.6 ms per loop, for `target_for(10000)` it's 94 ms even, and for `target_for_inc(10000)` (unnecessarily incrementing `i` but otherwise using `for i in range(i, n + 1)`), it's 94.2 ms. All tests on CPython 3.6.4, x86-64 build for Linux. – ShadowRanger Dec 26 '18 at 14:34
  • And none of those timings are all that stable, even with the `%timeit` magic running 5 sets of 10 calls each (the variance exceeds the real difference; I suspect the timings looking good for `for` here were sheer coincidence). Point is, you're doing enough work that it shouldn't matter what loop structure you use; using `range` is better for style, readability and succinctness, but it's not going to matter much performance-wise since the loop is doing real work. – ShadowRanger Dec 26 '18 at 14:38
  • The only place I'd conceivably expect to see any meaningful difference would be on Python 2, where `range` makes an actual `list` of the values up front, which for `n = 10000` would impose a pretty substantial increase in peak memory usage; on Python 2 you should be using `xrange` (which is lazy, like Python 3's `range`). That said, on testing, `range` vs. `xrange` makes almost no difference for your sample code (all other approaches were 102-103 ms, `xrange` dropped it to 101-102 ms, but that's pretty insignificant). – ShadowRanger Dec 26 '18 at 14:39
  • @ShadowRanger thanks for the great explanation. appreciate it. JTBC, I am "not" adding i+1 with 'for' loop. Having said that, I am beginning to believe that my tests are not representative and I just happen to have gotten these numbers. Thanks again! – badmad Dec 27 '18 at 11:25