11

This part of the CryENGINE SDK headers caught my attention:

branchmask.h

#ifndef __BRANCHLESS_MASK__
#define __BRANCHLESS_MASK__

///////////////////////////////////////////
// helper functions for branch elimination
//
// msb/lsb - most/less significant byte
//
// mask - 0xFFFFFFFF
// nz   - not zero
// zr   - is zero

ILINE const uint32 nz2msb(const uint32 x)
{
    return -(int32)x | x;
}

ILINE const uint32 msb2mask(const uint32 x)
{
    return (int32)(x) >> 31;
}

ILINE const uint32 nz2one(const uint32 x)
{
    return nz2msb(x) >> 31; // int((bool)x);
}

ILINE const uint32 nz2mask(const uint32 x)
{
    return (int32)msb2mask(nz2msb(x)); // -(int32)(bool)x;
}


ILINE const uint32 iselmask(const uint32 mask, uint32 x, const uint32 y)// select integer with mask (0xFFFFFFFF or 0x0 only!!!)
{
    return (x & mask) | (y & ~mask);
}


ILINE const uint32 mask_nz_nz(const uint32 x, const uint32 y)// mask if( x != 0 && y != 0)
{
    return msb2mask(nz2msb(x) & nz2msb(y));
}

ILINE const uint32 mask_nz_zr(const uint32 x, const uint32 y)// mask if( x != 0 && y == 0)
{
    return msb2mask(nz2msb(x) & ~nz2msb(y));
}


ILINE const uint32 mask_zr_zr(const uint32 x, const uint32 y)// mask if( x == 0 && y == 0)
{
    return ~nz2mask(x | y);
}

#endif//__BRANCHLESS_MASK__

Could someone throw a short explanation how exactly are these functions intended to be used to reduce branches? ILINE I suppose is predefined force inline or something like that. I searched Google about it, but all I found were copies of the CryENGINE headers uploaded in different sites, but no discussions about this specific one.

Bart
  • 19,692
  • 7
  • 68
  • 77
ulak blade
  • 2,515
  • 5
  • 37
  • 81

1 Answers1

11

These functions return bit-masks that can be and'd with results in other calculations, in order to perform operations without conditionals, and thus without introducing branches.

For example:

  • nz2mask returns 0 if the argument is 0, and 0xffffffff otherwise.
  • msb2mask returns 0 if the top bit of the argument is 0, and 0xffffffff if it is 1.

So if you have code like (with x86 instructions for reference):

if(a != 0) x += y;
    //  test        ebx,ebx  
    //  je          skip  
    //  add         dword ptr [x],eax  
    // skip:

You can replace it with:

x += y & (nz2mask(a));
    //  mov     ecx,ebx  
    //  neg     ecx  
    //  or      ecx,ebx  
    //  sar     ecx,1Fh  
    //  and     ecx,eax  
    //  add     ecx,dword ptr [x]  

It produces more instructions (at least on x86), but it avoids a branch.

Then there are additional functions like iselmask() which allow the selection of either input based on the mask provided, so you could replace:

x = (a != 0) ? r1 : r2;

with

x = iselmask(nz2mask(a), r1, r2);

Again, these functions should inline and compile down to relatively efficient assembler, trading off a bit of extra maths for no branching.

JasonD
  • 16,464
  • 2
  • 29
  • 44
  • 1
    upvoted. In the first example, we can see that there's a test (the `if` instruction), which at machine code level is translated to a conditional branch. In the replacement instruction, no more test is found, and the machine code won't contain a branch. – didierc Dec 15 '12 at 18:35
  • thanks for the answers,I guess I'll be replacing my if elses with this now :D – ulak blade Dec 16 '12 at 08:58
  • Definitely profile the results before blindly doing this kind of stuff - it's not always a win. You have to know that branching is costing you performance before trying to eliminate it. – JasonD Dec 16 '12 at 09:00
  • I did a small test,the result is: Enter execution count 10000 execution speed with branching was:2926 execution speed without branching was:2929 Press any key to continue . . . however all the program did in the if'else brackets was just do a huge amount of int operations,maybe I should try something more complex? EDIT: with 100k executions branching is almost 1000 msec faster o_O – ulak blade Dec 16 '12 at 10:01