2

Let us consider the following code snippet in C++ to print the fist 10 positive integers :

for (int i = 1; i<11;i++)

{

  cout<< i ;

}

Will this be faster or slower than sequentially printing each integer one by one as follow :

x =1;

cout<< x;

x++;

cout<< x;

And so on ..

Is there any reason as to why it should be faster or slower ? Does it vary from one language to another ?

Paul Rooney
  • 20,879
  • 9
  • 40
  • 61
Agnivesh Singh
  • 507
  • 3
  • 12
  • compare the number of operations. It's same. – lukai Feb 20 '17 at 04:08
  • 4
    Any reason why looping should be faster? Yes. Caching. – phonetagger Feb 20 '17 at 04:08
  • @lukai I don't think number of operation is the same: There is no comparison happening in the non-loop version. (Though such approach is almost useless in real life) – Adrian Shum Feb 20 '17 at 04:11
  • Try it with the [goldbold compiler](https://godbolt.org/g/gSWUBw). Although I'd say there does seem to be a moot element to this question, since you'd only really care about super micro optimization like this if you were doing a lot of iterations. For the second approach a lot of iterations is impractical. – Paul Rooney Feb 20 '17 at 04:19
  • The second example would be faster because there is no overhead from the `for` loop, such as branching and comparing. The difference may be negligible. The best solution is to **profile** or measure. – Thomas Matthews Feb 20 '17 at 04:35
  • 1
    It's I/O bound. You're not going to notice or probably even measure the computational time difference. – user207421 Feb 20 '17 at 04:47
  • 1
    Note that the second approach violates the DRY principle ( https://en.wikipedia.org/wiki/Don't_repeat_yourself ) and will make the program difficult to maintain. Combined with the fact that the compiler is already smart enough to do the same trick "behind the scenes" at compile time, if it's likely to be helpful, there is really no reason to do it manually. – Jeremy Friesner Feb 20 '17 at 05:47

2 Answers2

3

This question is similar to this one; I've copied an excerpt of my answer to that question below... (the numbers are different; 11 vs. 50; the analysis is the same)

What you're considering doing is a manual form of loop unrolling. Loop unrolling is an optimization that compilers sometimes use for reducing the overhead involved in a loop. Compilers can do it only if the number of iterations of the loop can be known at compile time (i.e. the number of iterations is a constant, even if the constant involves computation based on other constants). In some cases, the compiler may determine that it is worthwhile to unroll the loop, but often it won't unroll it completely. For instance, in your example, the compiler may determine that it would be a speed advantage to unroll the loop from 50 iterations out to only 10 iterations with 5 copies of the loop body. The loop variable would still be there, but instead of doing 50 comparisons of the loop counter, now the code only has to do the comparison 10 times. It's a tradeoff, because the 5 copies of the loop body eat up 5 times as much space in the cache, which means that loading those extra copies of the same instructions forces the cache to evict (throw out) that many instructions that are already in the cache and which you might have wanted to stay in the cache. Also, loading those 4 extra copies of the loop body instructions from main memory takes much, much longer than simply grabbing the already-loaded instructions from the cache in the case where the loop isn't unrolled at all.

So all in all, it's often more advantageous to just use only one copy of the loop body and go ahead and leave the loop logic in place. (I.e. don't do any loop unrolling at all.)

Community
  • 1
  • 1
phonetagger
  • 7,701
  • 3
  • 31
  • 55
1

In loop, the actual machine level instruction would be the same, and therefore the same address. In explicit statements, the instructions will have different addresses. So it is possible that for loops, the CPU's instruction cache will provide performance boost that might not happen for the latter case.

For really small range (10) the difference will most likely be negligible. For significant length of the loop it could show up more clearly.

Stack crashed
  • 210
  • 3
  • 10