3

is there a way to compute:

/** clipped(a - b) **/
unsigned char clipped_substract(unsigned char a, unsigned char b)
{
    return a > b ? a - b : 0;
}

using some binary operations instead of a test?

Nick Skywalker
  • 1,027
  • 2
  • 10
  • 26
  • No there is no such thing. What's wrong with your function? – Jabberwocky Dec 13 '19 at 11:31
  • If you use a cpu architecture that supports the mmx command extension set, you can perform an unsigned saturated subtraction to achieve this. Other architectures might have similar instructions. – Ctx Dec 13 '19 at 11:32
  • @Jabberwocky I would like to optimize computation and tests can lower performances. – Nick Skywalker Dec 13 '19 at 11:35
  • 4
    @Abitbol If you care about performance, you profile. You don't go by "tests can lower performance so I won't use tests". That's wrong, dangerous and will lead to bad code with bad performance. Again, profile or don't say you care about performance. – Max Langhof Dec 13 '19 at 12:03
  • You should probably take a look at the generated code. If you're using a decent compiler, this code is simple enough for it to get optimized quite well. – th33lf Dec 13 '19 at 12:15

3 Answers3

8

The original function compiles to a conditional move and is not subject to the performance concern you have. You are attempting to perform premature optimization here, and it will do absolutely no good for you. Write readable code, then profile it, then identify and optimize performance bottlenecks.

You say "tests can lower performances". If you want to do micro-optimizations like this, you should aim to understand the reasons underlying these rules of thumb, not apply them as unconditional truth.

What lowers performance about "tests" are (mispredicted) control flow branches. Modern CPUs heavily use instruction pipelining, and control flow branches mean it's unclear which instruction comes next. The common approach is that the CPU guesses (using branch prediction hardware/algorithms) which way the branch will go. If it gets it wrong, it has to flush the entire pipeline and refill it, which wastes cycles.

Well, the code in question has no such issue. It compiles to a conditional move:

https://godbolt.org/z/upkgok

clipped_substract(unsigned char, unsigned char):
    mov     eax, edi
    mov     edx, 0
    sub     eax, esi
    cmp     dil, sil
    cmovbe  eax, edx
    ret

You can see that there is no control flow branch here. See Why is a conditional move not vulnerable for Branch Prediction Failure? for why that is not subject to the above performance concern.

I repeat: Write readable code, then profile it, then identify and optimize performance bottlenecks. What you attempted here was to fix a performance problem code that didn't even exist in the first place (not to mention you didn't prove that the performance of this code is even relevant to your program's performance).

Max Langhof
  • 23,383
  • 5
  • 39
  • 72
1

Using clang 11.0, compiler optimizations enabled

unsigned char clipped_substract(unsigned char a, unsigned char b)
{   
    long c = b - a;
    return ((unsigned long)c >> (sizeof(c)*8-1)) * c;
}

Although time taken by any of the alternatives is neglagibly different, but when performing 10000000 iterations, other ways take around 0.300secs on my machine and this solution takes around 0.200secs.

But this is only valid if you use a type whose size is greater than char, for example here I took long. I don't remember exact definitions of char or long but I know long is wider on my machine compared to char. So adjust it accordingly.

Basically, I just store the result of subtraction in a larger width variable. If the result is negative, most significant bit will be 1 else 0.

Mihir Luthra
  • 6,059
  • 3
  • 14
  • 39
  • 3
    No, never bit shift negative types. Needed bug fix: `((unsigned long)c >> (sizeof(c)*8-1)) * c;`. Otherwise you can't know if you get arithmetic or logical shift. – Lundin Dec 13 '19 at 14:03
  • @Lundin Actually shifting negative types can be quite useful if there's a need to retain sign bit. As there's indeed no guarantee for arithmetic shift, I usually protect with `#if -1 >> 1 != -1 #error no arithmetic shift #endif` or similar... – Aconcagua Dec 13 '19 at 15:43
  • @Aconcagua: this preprocessor test is not sufficient as the preprocessor may use different arithmetic than the target system. – chqrlie Dec 14 '19 at 15:39
0
/** clipped(a - b) **/
unsigned char clipped_substract(unsigned char a, unsigned char b)
{
    return a - (a+b) / 2 + getAbs(a-b) / 2;
}

unsigned int getAbs(int n)
{
    int const mask = n >> (sizeof(int) * 8 - 1);

    return ((n + mask) ^ mask);
}
  • 5
    Is this really better? `abs` is likely to have a test, in that case this is less performant. – Nick Skywalker Dec 13 '19 at 11:42
  • 2
    @Abitbol `abs` is easily implemented as clearing the sign bit of the integer. See https://godbolt.org/z/Qkb6Fk (here it uses sign extension instead of having the sign bit mask constant in the code). – Max Langhof Dec 13 '19 at 12:01
  • `abs` *can* be calculated [without branching](https://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs). More efficient??? Won't be going to analyse... – Aconcagua Dec 13 '19 at 12:09
  • 1
    It's not abs() that is the problem. This code has more operations than the single subtraction and is bound to be less performant than the original code. We should also remember that all modern processors have branch predictors. – th33lf Dec 13 '19 at 12:11
  • @Abitbol I updated the answer with replacing the abs function. – Alex Vorobiov Dec 13 '19 at 12:15
  • True that I can't be asking for more efficient without going to analyse, I am just asking for an alternative candidate. – Nick Skywalker Dec 13 '19 at 12:15
  • For the sake of curiosity, is a similar trick with addition overflow? – Nick Skywalker Dec 13 '19 at 12:17
  • @Abitbol `a + b = a - (-b)` – obviously, you'd then just have to swap `+b` and `-b` in above algorithm... – Aconcagua Dec 13 '19 at 12:18
  • @Aconcagua I would be very careful with unary minus on `unsigned`... – Max Langhof Dec 13 '19 at 12:19
  • @MaxLanghof [Why?](https://stackoverflow.com/questions/8026694/c-unary-minus-operator-behavior-with-unsigned-operands) – Aconcagua Dec 13 '19 at 12:22
  • 2
    I don't think this is any better: Look at the [generated assembly code](https://www.godbolt.org/z/TkEpoH). This version uses more than twice the number of instructions than the original version. I know the number of instructions is not the only thing that counts, but in this case I thnink it's pretty obvious. – Jabberwocky Dec 13 '19 at 12:39