1

I do not seem able to improve the following code:

    if(state[SDL_SCANCODE_M]){
        if(_ball.velocity.x < 0)
            _ball.velocity.x -= 20;
        else
            _ball.velocity.x += 20;

        if(_ball.velocity.y < 0)
            _ball.velocity.y -= 20;
        else
            _ball.velocity.y += 20;
    }

    if(state[SDL_SCANCODE_L]){
        if(_ball.velocity.x < 0)
            _ball.velocity.x += 20;
        else
            _ball.velocity.x -= 20;

        if(_ball.velocity.y < 0)
            _ball.velocity.y += 20;
        else
            _ball.velocity.y -= 20;
    }

They are pretty similar, but the operations are the opposite.

Is there any best practice or technique that allows to improve these kind of situations?

genpfault
  • 51,148
  • 11
  • 85
  • 139
Eideann
  • 67
  • 4

2 Answers2

3

You can pre-compute some factor so to inverse the sign regarding the conditions.

Here is an example:

int xCoef = (_ball.velocity.x < 0) ? -1 : 1;
int yCoef = (_ball.velocity.y < 0) ? -1 : 1;

if(state[SDL_SCANCODE_M]){
    _ball.velocity.x += xCoef * 20;
    _ball.velocity.y += yCoef * 20;
}

if(state[SDL_SCANCODE_L]){
    _ball.velocity.x -= xCoef * 20;
    _ball.velocity.y -= yCoef * 20;
}

A more compact (branch-less) version that is a bit less clear (but probably more efficient) is the following:

int xCoef = (_ball.velocity.x < 0) ? -1 : 1;
int yCoef = (_ball.velocity.y < 0) ? -1 : 1;
int mPressed = state[SDL_SCANCODE_M] ? 1 : 0;
int lPressed = state[SDL_SCANCODE_L] ? 1 : 0;
_ball.velocity.x += (mPressed - lPressed) * xCoef * 20;
_ball.velocity.y += (mPressed - lPressed) * yCoef * 20;

Note that if state is a boolean array, you do not even need to use a ternary since booleans can be directly converted to 0-1 integer values.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • 1
    I wouldn't call the compact version "branch-less". The ternary operator may still perform a jump. And I doubt that could be more efficient than a plain `if`. – Costantino Grana Mar 24 '22 at 20:17
  • 3
    Both GCC and Clang generate an assembly code without any conditional jumps: https://godbolt.org/z/o3hedcs3f . One can easily modify the ternaries to generate a code without conditions but compilers can do this boring job for us ;) . The generated code by Clang is pretty efficient. At least, far more efficient than conditionals due to the expensive possible *branch misprediction* and instruction-level-parallelism. – Jérôme Richard Mar 24 '22 at 20:40
  • 1
    You are totally right. I came up with my own version of "branch-less", but of course I had to resort to multiplication also for `int xCoef = 1 - 2*(_ball.velocity.x < 0);`. Result? The compiler was better then me, because I failed to consider `cmov` instructions. – Costantino Grana Mar 25 '22 at 06:59
  • Thank you for your answers! Really helpful, I will use that. One question about the link you posted from godbolt.org - with that you can compare which code is more efficient / runs faster? – Eideann Mar 25 '22 at 12:07
  • 1
    @Eideann Well, not directly, but mostly. You can see the generated assembly compiled from a C/C++ code (and many other languages) regarding a chosen compiler. You can do the same locally but this website is great to share the result easily. If you know the cost of each instruction and have a pretty good knowledge of how processor works, you can estimate the performance of an assembly code. If you do not, then the best is simply to run the code. – Jérôme Richard Mar 25 '22 at 12:20
  • 1
    @Eideann For example, instruction `lea`, `mov`, `test`, `cmp`, `add` instructions are very cheap: they generally cost 1 cycle and the processor can run them in parallel regarding the data dependency between instructions. `cmovns` is a bit more expensive but still cheap, `imul` takes few cycles of latency. `idiv` or `sqrt` are very expensive (dozens of cycle of latency). On Skylake processors, AFAIR, a conditional jump is 1~2 cycle if predicted correctly and about ~14 cycles otherwise (similar on AMD). You can get such infos on https://www.agner.org/optimize/instruction_tables.pdf . – Jérôme Richard Mar 25 '22 at 12:32
  • For more infos about ILP, consider reading: https://en.wikipedia.org/wiki/Instruction-level_parallelism – Jérôme Richard Mar 25 '22 at 12:35
2

This is quite easy:

// lambda or function it does not matter
auto upgradeVelocity = [](auto& velocity, auto deltaIfGreaterZero, auto deltaIfLessZero){
    velocity += velocity < 0 ? deltaIfLessZero : deltaIfGreaterZero;
};

if(state[SDL_SCANCODE_M]){
    upgradeVelocity(_ball.velocity.x, -20, 20);
    upgradeVelocity(_ball.velocity.y, -20, 20);
}

if(state[SDL_SCANCODE_L]){
    upgradeVelocity(_ball.velocity.x, 20, -20);
    upgradeVelocity(_ball.velocity.y, 20, -20);
}

And you can see, the L state and M state is reciprocal, than you can do something like that:

auto delta = (state[SDL_SCANCODE_L] ? 20 : 0) - (state[SDL_SCANCODE_M] ? 20 : 0); 
upgradeVelocity(_ball.velocity.x, delta, -delta);
upgradeVelocity(_ball.velocity.y, delta, -delta);

This code may not be as clear as the previous one, but it short

  • Thanks! I am still a bit new to Lambdas, but I will save this snippet and will come back to it once I have played with them a bit. Appreciated! One question, what do you mean with "the L state and M state is reciprocal"? – Eideann Mar 25 '22 at 12:09
  • 1
    It means, your system could be in state M or in state L; being in state L and state M at the same time doesn't make sense – Gleb Bezborodov Mar 25 '22 at 19:05