2

I am looking for a fast (branch-less) code to compute the following:

if ( k > 0 )
    x += i;
else if ( k < 0 )
    x -= i;
else if ( k == 0 ) 
    // do nothing.

where k, x, and i are of int type.

It is fine for the solution to use intrinsics.

cigien
  • 57,834
  • 11
  • 73
  • 112
user2052436
  • 4,321
  • 1
  • 25
  • 46
  • 2
    You've busted the compiler generating inefficient code for this? – user4581301 Nov 03 '20 at 22:42
  • How do you know it's not branchless? – KamilCuk Nov 03 '20 at 22:50
  • 3
    user2052436, Approaches using `x += signof(k)*i;` suffer from potential UB when `i == INT_MIN, k < 0`. Do you care about this corner case? – chux - Reinstate Monica Nov 03 '20 at 23:24
  • 1
    The question and all the currently posted answers are all wild speculation. Discussing this without showing the generated machine code for some CPU is quite pointless. – Lundin Nov 04 '20 at 09:30
  • @Lundin the nature of a micro-optimization question like this is that there's no single best answer. The only thing you can do is collect a number of different approaches and benchmark them in your own final configuration. There's already an assumption built into the question that branchless will be faster, but that's not a given either. – Mark Ransom Nov 04 '20 at 16:15

9 Answers9

4

This answer shows a fast, branchless way of computing the sign of a value. Assuming that's correct, you can write:

x += i * ((0 < k) - (k < 0))
cigien
  • 57,834
  • 11
  • 73
  • 112
  • 2
    This very much depends on the compiler having a branchless way to evaluate `0 < k` and `k < 0`. – Mark Ransom Nov 03 '20 at 22:54
  • @MarkRansom Couldn't say for sure. The linked answer claims it's branchless, and I'm just building upon that. Is the phrasing better now? – cigien Nov 03 '20 at 22:55
2

Try this:

int sign = !!k - !!(k & INT_MIN) * 2;
x += i * sign;

This assumes two's complement representation of integers, resulting in INT_MIN having only the high-order bit set.

dbush
  • 205,898
  • 23
  • 218
  • 273
1

Just compile your code with good optimisation settings, get a disassembly of your code, and look at the result. Then change your code until the disassembly shows what you want. There's a chance that you already get branchless code. There's a better chance if you write

if ( k > 0 )
    x += i;
if ( k < 0 )
    x -= i;

because the else in between makes it harder for the compiler to avoid branches.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
1
x = x + i*(k>0) - i*(k<0);

But, honestly, given the many optimization possibilities of modern compilers, I wouldn't trade clarity of code for aleggedly faster code. For the x86 platform, for example, MMX and SSE units can do this operation with no branches.

cigien
  • 57,834
  • 11
  • 73
  • 112
mcleod_ideafix
  • 11,128
  • 2
  • 24
  • 32
0

You seem to be expecting code that is "branchless" to be measurably and importantly "faster?" With maximum optimization settings turned on, exactly what instruction sequence does your compiler now produce? Have you then profiled that code to confirm that it is not fast enough?

Mike Robinson
  • 8,490
  • 5
  • 28
  • 41
0

Try looking at the assembly language for this:

if (k != 0)
{
    const int multiplier = (k > 0) ? 1 : -1;
    x += (multiplier) * i;
}

Some processors may have conditional instructions that are only executed depending on the condition code (like ARM).

Look at the assembly code printed by your compiler.

Edit 1: performance concerns
A lot of modern processors have enough space in their instruction cache to that they can handle the above code efficiently. The comparison should be quick for the processor's branch decision making circuitry.

Profile or Benchmark before making any judgements about code having to be one line.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
0

Just for kicks, I present this:

using opType = void(*)(int&, int);
const opType ops[3] =
{
    [](int&  , int  ){ },
    [](int& x, int i){ x += i; },
    [](int& x, int i){ x -= i; }
};

auto f = ops[int(k > 0) + (2 * int(k < 0))];
f(x, i);

Live Demo

Obviously, the other solutions using solely mathematics are going to be more efficient :) But this give you some options if you want to do more complex operations based on what k is.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • 1
    Now I'm itching to see the assembly for this :) – cigien Nov 03 '20 at 23:09
  • Oh, I wasn't actually asking for the link, but thanks :) Hmm, even at O3 there appear to be jumps that don't get optimized out. Not that I'm any good at reading assembly. – cigien Nov 03 '20 at 23:21
0

Except for INT_MIN the following should work in C++11

x += (std::signbit(-k)-std::signbit(k))*i;

And the following should work in all cases, including INT_MIN - if long is longer than int:

x += (std::signbit(-(long)k)-std::signbit(k))*i;

Added: Internally std::signbit is in many compilers converted to a simple shift-operation, which should be branch-less and fast.

Hans Olsson
  • 11,123
  • 15
  • 38
-2

This is a branchless solution, but the time will vary based on your environment. It shouldn't take too much more than 0.005 seconds, so I think that won't be a problem.

k>0?x+=i:k<0?x-=i:0
Lakshya Raj
  • 1,669
  • 3
  • 10
  • 34