-2

I am working on optimization of some code and came across this, could someone tell me why this piece of code is more 'optimized'

for (i = 0; i < 1000; i+=2){
    float var = numberOfEggs*arrayX[i] + arrayY[i];
    arrayY[i+1] =  var;
    arrayY[i+2] = numberOfEggs*arrayX[i+1] + var;
}

than this version?

for(long i = 0; i < 1000 ; ++i)
       arrayY[i+1] = numberOfEggs*arrayX[i] + arrayY[i];

any help is appreciated thank you!

Megan Darcy
  • 530
  • 5
  • 15
  • 3
    Can you elaborate a little bit more when you say "more optimized"? – AndyG Aug 23 '21 at 14:21
  • Are you sure the loop control statements are the same in both cases? At a quick glance, it looks like the optimized version may be using `i+=2` in place of `i++` for the loop increment. – Adrian Mole Aug 23 '21 at 14:24
  • 3
    Did you intend to include the line that modifies `arrayY[i+2]`? That line causes your two _versions_ to do different things. – Drew Dormann Aug 23 '21 at 14:24
  • @AndyG i guess in terms of like the memory usage/cache coherence? Or like is it better in terms of speed? or not really much of a difference when a 'var' variable is used instead of having it in one line? – Megan Darcy Aug 23 '21 at 14:29
  • @AdrianMole sorry I made a mistake when typing, have modified the code. thanks for pointing it out :) – Megan Darcy Aug 23 '21 at 14:29
  • 1
    You need to ask the person making that claim why they are making it. – molbdnilo Aug 23 '21 at 14:33
  • If the former runs faster than the second, then it is considered more optimised because it's faster. If it doesn't run faster, then the premise is faulty because it wouldn't be considered more optimised in that case. (I suppose, sometimes you might care about something other than speed, such as code size, but if that's the case then you should mention it in the question) – eerorika Aug 23 '21 at 14:35
  • 3
    @MeganDarcy: Are there any concrete measurements that back up one claim or the other? Is the resulting assembly even different? – AndyG Aug 23 '21 at 14:49

1 Answers1

3

The first example is performing two assignments per iteration. You can tell by the increment statement.

This is called loop unrolling. By performing two assignments per iteration, you are removing half of the branches.

Most processors don't like branch instructions. The processor needs to determine whether or not to reload the instruction cache (branch prediction). There are at least two branches per iteration. The first is for the comparison, the second is to loop back to the comparison.

To experiment, try using 4 assignments per iteration, and profile.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • Branch prediction isn't about reloading the instruction *cache*, it's about instructions already fetched into the *pipeline*. http://www.lighterra.com/papers/modernmicroprocessors/ / [What exactly happens when a skylake CPU mispredicts a branch?](https://stackoverflow.com/q/50984007). (Easily-predicted branches are not a big problem, but fewer branches does mean fewer total instructions, and front-end throughput for very short loops may suffer some if the number of uops isn't a multiple of the pipeline width.) – Peter Cordes Sep 20 '21 at 08:13