0

I am looking to optimize a loop using loop unswitching and SIMD so that I can speedup execution time.

for (b_idx = 0; b_idx < e_idx; b_idx++) {
    if (fxp < 0) {
        fxp += LUT[b_idx];
        x += ytmp;
        y -= xtmp;
    } else {
        fxp -= LUT[b_idx];
        x -= ytmp;
        y += xtmp;
    }
    xtmp = x >> (b_idx + 1);
    ytmp = y >> (b_idx + 1);
}
bolov
  • 72,283
  • 15
  • 145
  • 224
  • Loop unswitching works when the condition has a constant value throughout the loop. But here, since `fxp` is modified inside the loop, the truth value of `fxp < 0` may change during the loop. So I don't see how to unswitch it here. Unless you know something about the values in the `LUT` array that prove this can't happen? – Nate Eldredge Aug 28 '21 at 04:52
  • That's correct observation !! Any suggestion here ?? –  Aug 28 '21 at 05:14
  • 1
    How fxp,x,y,xtmp,ytmp are initialized? Is e_irx fixed? Like 32 – tstanisl Aug 28 '21 at 08:01

2 Answers2

0

The inner loop contains a if condition that needs branch prediction. Branch misprediction can introduce big slow downs. It is most certainly the bottleneck of the algorithm.

In this case, you can easily remove the condition by noting that branches only differ by one sign.

for (b_idx = 0; b_idx < e_idx; b_idx++) {
    int sgn = (fxp < 0) * 2 - 1; // +1 or -1
    fxp += sgn * LUT[b_idx];
    x += sgn * ytmp;
    y -= sgn * xtmp;

    xtmp = x >> (b_idx + 1);
    ytmp = y >> (b_idx + 1);
}

But as the algorithm is incremental, loop b_idx+1 depends on loop b_idx, that loop cannot be parallelized.

prapin
  • 6,395
  • 5
  • 26
  • 44
  • 'int sgn = (fxp < 0) * 2 - 1;' // +1 or -1 , can be converted to 'int sgn = (fxp < 0) ? 1 : 1;' –  Aug 29 '21 at 05:52
  • 1
    @Coder Maybe, but the purpose of my refactor was to avoid any conditional that can be quite costly. A ternary operator is like a hidden `if`, and will probably be slower than a pure arithmetic expression. – prapin Aug 29 '21 at 17:37
  • I think I have not understood the factor '* 2 - 1' clearly. Could you please explain how it is translated into '+1' or '-1' ? Can do constant variable for * 2 -1 as replacement ? Thank you –  Aug 30 '21 at 03:35
  • If `fxp` < 0, `sgn = (true) * 2 - 1 = 1`, otherwise, `sgn = (false) * 2 - 1 = -1`, since `true` is convertible to `1` and `false` to `0`. – prapin Aug 30 '21 at 07:32
-1

You can consider moving if condition line out of the loop to avoid many times determination.

Shuduo
  • 727
  • 5
  • 14
  • Can you please share snippet what do you mean here by moving the if condition outside the size ? –  Aug 28 '21 at 04:11
  • if (fxp < 0) { for (b_idx = 0; b_idx < e_idx; b_idx++) { fxp += LUT[b_idx]; x += ytmp; y -= xtmp; } } else { for (b_idx = 0; b_idx < e_idx; b_idx++) { fxp -= LUT[b_idx]; x -= ytmp; y += xtmp; } } xtmp = x >> (b_idx + 1); ytmp = y >> (b_idx + 1); –  Aug 28 '21 at 04:12
  • yes, use redundancy code to save CPU cycles – Shuduo Aug 28 '21 at 04:23
  • That doesn't seem right because `fxp` is modified through the loop, and could change from positive to negative and back within the loop. – Nate Eldredge Aug 28 '21 at 04:50