3

I have the following piece of simple code that is potentially going to be executed many hundreds of millions of times;

for (int i = 0; i < 8; i++)
    if (((p[i].X >= x) && (p[i].X <= x + d))
        &&((p[i].Y >= y) && (p[i].Y <= y + d))
        &&((p[i].Z >= z) && (p[i].Z <= z + d)))
      return 1;

Will the optimizer in the Visual C++ 2010 compiler unroll this loop for me, or am I better off to do it manually? I've looked at other similar questions but don't see any specific results. I

Community
  • 1
  • 1
SmacL
  • 22,555
  • 12
  • 95
  • 149
  • 1
    If you're ever not sure and the loop isn't too incredibly long, there isn't much harm in unrolling it yourself -- assuming of course it is a bottleneck and you want to squeeze more performance while possibly lowering readability. Though I would imagine VS2010 would unroll the loop. – Rapptz Mar 07 '13 at 09:05
  • 1
    do you know the exact hardware platforms that this code will run on? Then I would suggest simply to unroll it manually and measure what happens instead of making guesses of what the compiler does (based on his guesses :-)). – Philipp Mar 07 '13 at 09:08
  • I'd hope VS2010 would do this for me ok, but might try profiling both versions. Maybe look at the assembler output. – SmacL Mar 07 '13 at 09:08
  • 2
    Ask instead what is the higher level meaning of this. (you've got 8 3d-coordinates and are interested if any of them is aligned to some d-sized cube.) What is the probability of this event? Can it be vectorized? – Aki Suihkonen Mar 07 '13 at 10:53
  • @AkiSuihkonen, yes the higher level problem is part of the computation of an intersection between an arbitrary frustum with an axis aligned cube, which also contains many other optimizations before and after the above code. I hadn't considered vectorization, but will certainly investigate it. Thanks for the feedback. – SmacL Mar 07 '13 at 10:59
  • One significant optimization before loop unrolling would be to use `if (fabs(p[i].X - box_center) < half_d)` – Aki Suihkonen Mar 07 '13 at 12:01
  • Take a look at the code it generates with the `/FAsc` option. – Michael Burr Mar 07 '13 at 09:06
  • @AkiSuihkonen: That's actually not as efficient as `unsigned(p[i].X) - unsigned(x) < unsigned(d)`. Over- and underflow are well-defined for unsigned types, so a difference of `-1` ends up as `UINT_MAX-1`, which is a lot larger than `unsigned(d)`. The actual casts are free on modern hardware. – MSalters Mar 07 '13 at 12:48
  • @ShaneMacLaughlin: For a point to lie within a cube, you need to test `(x in range) && (y in range) && (z in range)`. You now test `(x in range) || (y in range) || (z in range)`. – MSalters Mar 07 '13 at 12:50
  • @MSalters -- depends on whether 'd' is integer or not. – Aki Suihkonen Mar 07 '13 at 13:09
  • @MSalters, quite correct. Fixed – SmacL Mar 07 '13 at 15:12

1 Answers1

1

The real question is, what do you gain from unrolling ?

Unrolling shaves off one branch (if i >= 8 stop) for every "unroll".

Your loop body contains 6 branches already (if * 1, || * 2, && * 3); so is there much to gain in unrolling it ?

It might be interesting to see how the code is optimized; but I am quite unsure whether unrolling should be your primary focus, I'd be more worried about how the complex condition is handled!

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • The condition is to check if any of a set of 8 points point lie within an axis aligned cube in Cartesian space. In unrolling, I was wondering whether replacement of [i] with [n], where n is constant, would also remove an indexing operation. – SmacL Mar 07 '13 at 10:56
  • @ShaneMacLaughlin: then the equation to be optimized should first be fixed :) – Aki Suihkonen Mar 07 '13 at 13:15