How to reduce the amount of checks of similar code blocks?

Question

I do not seem able to improve the following code:

    if(state[SDL_SCANCODE_M]){
        if(_ball.velocity.x < 0)
            _ball.velocity.x -= 20;
        else
            _ball.velocity.x += 20;

        if(_ball.velocity.y < 0)
            _ball.velocity.y -= 20;
        else
            _ball.velocity.y += 20;
    }

    if(state[SDL_SCANCODE_L]){
        if(_ball.velocity.x < 0)
            _ball.velocity.x += 20;
        else
            _ball.velocity.x -= 20;

        if(_ball.velocity.y < 0)
            _ball.velocity.y += 20;
        else
            _ball.velocity.y -= 20;
    }

They are pretty similar, but the operations are the opposite.

Is there any best practice or technique that allows to improve these kind of situations?

FYI: [Is there a standard sign function (signum, sgn) in C/C++?](https://stackoverflow.com/q/1903954/7478597) — Scheff's Cat, Mar 24 '22 at 19:59
Abstraction, man. it almost always comes down to abstraction. — user4581301, Mar 24 '22 at 20:05

score 3 · Accepted Answer · answered Mar 24 '22 at 20:01

3

You can pre-compute some factor so to inverse the sign regarding the conditions.

Here is an example:

int xCoef = (_ball.velocity.x < 0) ? -1 : 1;
int yCoef = (_ball.velocity.y < 0) ? -1 : 1;

if(state[SDL_SCANCODE_M]){
    _ball.velocity.x += xCoef * 20;
    _ball.velocity.y += yCoef * 20;
}

if(state[SDL_SCANCODE_L]){
    _ball.velocity.x -= xCoef * 20;
    _ball.velocity.y -= yCoef * 20;
}

A more compact (branch-less) version that is a bit less clear (but probably more efficient) is the following:

int xCoef = (_ball.velocity.x < 0) ? -1 : 1;
int yCoef = (_ball.velocity.y < 0) ? -1 : 1;
int mPressed = state[SDL_SCANCODE_M] ? 1 : 0;
int lPressed = state[SDL_SCANCODE_L] ? 1 : 0;
_ball.velocity.x += (mPressed - lPressed) * xCoef * 20;
_ball.velocity.y += (mPressed - lPressed) * yCoef * 20;

Note that if state is a boolean array, you do not even need to use a ternary since booleans can be directly converted to 0-1 integer values.

answered Mar 24 '22 at 20:01

Jérôme Richard

41,678
6
29
59

1

I wouldn't call the compact version "branch-less". The ternary operator may still perform a jump. And I doubt that could be more efficient than a plain `if`. – Costantino Grana Mar 24 '22 at 20:17
3

Both GCC and Clang generate an assembly code without any conditional jumps: https://godbolt.org/z/o3hedcs3f . One can easily modify the ternaries to generate a code without conditions but compilers can do this boring job for us ;) . The generated code by Clang is pretty efficient. At least, far more efficient than conditionals due to the expensive possible *branch misprediction* and instruction-level-parallelism. – Jérôme Richard Mar 24 '22 at 20:40
1

You are totally right. I came up with my own version of "branch-less", but of course I had to resort to multiplication also for `int xCoef = 1 - 2*(_ball.velocity.x < 0);`. Result? The compiler was better then me, because I failed to consider `cmov` instructions. – Costantino Grana Mar 25 '22 at 06:59
Thank you for your answers! Really helpful, I will use that. One question about the link you posted from godbolt.org - with that you can compare which code is more efficient / runs faster? – Eideann Mar 25 '22 at 12:07
1

@Eideann Well, not directly, but mostly. You can see the generated assembly compiled from a C/C++ code (and many other languages) regarding a chosen compiler. You can do the same locally but this website is great to share the result easily. If you know the cost of each instruction and have a pretty good knowledge of how processor works, you can estimate the performance of an assembly code. If you do not, then the best is simply to run the code. – Jérôme Richard Mar 25 '22 at 12:20
1

@Eideann For example, instruction `lea`, `mov`, `test`, `cmp`, `add` instructions are very cheap: they generally cost 1 cycle and the processor can run them in parallel regarding the data dependency between instructions. `cmovns` is a bit more expensive but still cheap, `imul` takes few cycles of latency. `idiv` or `sqrt` are very expensive (dozens of cycle of latency). On Skylake processors, AFAIR, a conditional jump is 1~2 cycle if predicted correctly and about ~14 cycles otherwise (similar on AMD). You can get such infos on https://www.agner.org/optimize/instruction_tables.pdf . – Jérôme Richard Mar 25 '22 at 12:32
For more infos about ILP, consider reading: https://en.wikipedia.org/wiki/Instruction-level_parallelism – Jérôme Richard Mar 25 '22 at 12:35

score 2 · Answer 2 · answered Mar 24 '22 at 20:01

This is quite easy:

// lambda or function it does not matter
auto upgradeVelocity = [](auto& velocity, auto deltaIfGreaterZero, auto deltaIfLessZero){
    velocity += velocity < 0 ? deltaIfLessZero : deltaIfGreaterZero;
};

if(state[SDL_SCANCODE_M]){
    upgradeVelocity(_ball.velocity.x, -20, 20);
    upgradeVelocity(_ball.velocity.y, -20, 20);
}

if(state[SDL_SCANCODE_L]){
    upgradeVelocity(_ball.velocity.x, 20, -20);
    upgradeVelocity(_ball.velocity.y, 20, -20);
}

And you can see, the L state and M state is reciprocal, than you can do something like that:

auto delta = (state[SDL_SCANCODE_L] ? 20 : 0) - (state[SDL_SCANCODE_M] ? 20 : 0); 
upgradeVelocity(_ball.velocity.x, delta, -delta);
upgradeVelocity(_ball.velocity.y, delta, -delta);

This code may not be as clear as the previous one, but it short

Thanks! I am still a bit new to Lambdas, but I will save this snippet and will come back to it once I have played with them a bit. Appreciated! One question, what do you mean with "the L state and M state is reciprocal"? — Eideann, Mar 25 '22 at 12:09
It means, your system could be in state M or in state L; being in state L and state M at the same time doesn't make sense — Gleb Bezborodov, Mar 25 '22 at 19:05

How to reduce the amount of checks of similar code blocks?

2 Answers2