0

I found two questions here regarding fast and/or elegant clamping of numbers in C/C++. The C one focuses on (potentially platform specific) optimization:

Fastest way to clamp a real (fixed/floating point) value?

The C++ one is also related to C:

Most efficient/elegant way to clip a number?

However, none of these mention if it is about an in-place operation or not, which seems to be a crucial piece of information to me. This question is about an in-place operation. The difference is that if the number is within the bounds, no write operation needs to take place.

When browsing through answers regarding clamping in general, most people just recommend to use the ternary variant, no matter what. I am wondering why is this the case? Both in terms of speed and readability, this is not clear to me, in particular for the in-place case. I would like to understand if there is a best practice and if so, I would be interested in a justification.

In my example code, there are three variants: The activated variant (1) is how I usually implement it. Variant (2) uses an else if and variant (3) uses the ternary operator.

According to measurements on an some i5 CPU and gcc with -O3, for TYPE being float and double, (1) is the fastest, (2) and (3) are clearly slower and achieve approximately the same speed compared to each other. For int, all variants achieve approximately same speed.

I tagged this question explicitly as C and not C/C++, because in the latter case - besides being a duplicate - people would answer to use std::clamp or Boost, which is not the idea of this question. This is about the best practice for an own platform-independent assembly free implementation, both in terms of speed and readability.

Side note, even though C++: using std::clamp was always on par with the slowest variant. It is clear, however, that such a function cannot be optimized for the specific case of an in-place operation, and inlining doesn't help here as it doesn't change the used instructions.

#include <stdio.h>
#include <stdlib.h>

#define MY_MAX(a, b) (((a) > (b)) ? (a) : (b))
#define MY_MIN(a, b) (((a) < (b)) ? (a) : (b))

typedef float TYPE;

int main()
{
    const TYPE a = (TYPE) (rand() % 100);
    const TYPE b = a + (TYPE) (1 + rand() % 100);

    double sum = 0.0;

    for (unsigned int i = 0; i < 100000000; i++) {
        TYPE x = (TYPE) (rand() % 100);

#if 1
        if (x < a) x = a;        
        if (x > b) x = b;
#elif 0
        if (x < a) x = a;
        else if (x > b) x = b;
#elif 0
        x = MY_MAX(a, MY_MIN(b, x));
#endif

        sum += x;
    }

    printf("%lf\n", sum);

    return 0;
}
Pedro
  • 842
  • 6
  • 16
  • Whether to generate code that writes conditionally or unconditionally is up to the compiler. It's an implementation detail that's way below the level of the language itself, and can't be answered in a universal way. Either version of your source code could be compiled as either conditional or unconditional. On many machines, an unconditional write could very well be faster than a conditional branch. – Nate Eldredge Mar 08 '21 at 23:27
  • Macros like your `MY_MAX` are usually best avoided because of the risk of accidentally calling them with an argument that has side effects. But in general I think this is really down to compiler-specific optimizations. Anyway, whether there's a "write" is kind of moot here because `x` is likely to stay in a register anyhow; there's not a clear way to distinguish whether you're writing x or computing an intermediate result. – Nate Eldredge Mar 08 '21 at 23:36
  • @NateEldredge I have to admit, regarding the write, the example is not well-chosen. But even more surprisingly, it seems good enough to make a difference between variant 1 and the others. Regarding speed, now having tried this code on three considerably different Intel CPUs (i3, i5, i9, ranging from 2012 to 2019), variant 1 always was clearly the fastest. So at least I would judge that variant 3 is not necessarily the best choice. – Pedro Mar 09 '21 at 07:20

0 Answers0