C++17 std::clamp
is a template function that makes sure the input value is not less than the given minimum and less than the given maximum, and returns the input value; otherwise it returns the minimum or the maximum respectively.
The goal is to optimize it, assuming the following:
- The type parameter is 32 bit or 64 bit integer
- The input value is way more likely to be already in range than out of range, so likely to be returned
- The input value is likely to be computed shortly before, the minimum and maximum is likely to be known in advance
- Let's ignore references, which may complicate optimization, but in practice are not useful for integer values
For both the standard implementation, and a naive implementation, the assembly generated by gcc and clang does not seem to favor the in range assumption above. Both of these:
#include <algorithm>
int clamp1(int v, int minv, int maxv)
{
return std::clamp(v, minv, maxv);
}
int clamp2(int v, int minv, int maxv)
{
if (maxv < v)
{
return maxv ;
}
if (v < minv)
{
return minv;
}
return v;
}
Compile into two cmov (https://godbolt.org/z/oedd9Yfro):
mov eax, esi
cmp edi, esi
cmovge eax, edi
cmp edx, edi
cmovl eax, edi
Trying to tell the compilers to favor the in range case with __builtin_expect
(gcc seem to ignore C++20 [[likely]]
):
int clamp3(int v, int minv, int maxv)
{
if (__builtin_expect(maxv < v, 0))
{
return maxv;
}
if (__builtin_expect(v < minv, 0))
{
return minv;
}
return v;
}
The result for gcc and clang are now different (https://godbolt.org/z/s4vedo1br). gcc still fully avoids branches using two cmov. clang has one branch, instead of expected two (annotation mine):
clamp3(int, int, int):
mov eax, edx ; result = maxv
cmp edi, esi ; v, minv
cmovge esi, edi ; if (v >= minv) minv = v
cmp edx, edi ; maxv, v
jl .LBB0_2 ; if (maxv < v) goto LBB0_2
mov eax, esi ; result = minv (after it was updated from v if no clamping)
.LBB0_2:
ret
Questions:
- Are there significant disadvantages in using conditional jumps that are expected to go the same branch each time, so that gcc avoids them?
- Is clang version with one conditional jump better than it would have been if there was two jumps?
Not using cmov
is suggested in Intel® 64 and IA-32 Architectures
Optimization Reference Manual, from June 2021 version page 3-5:
Assembly/Compiler Coding Rule 2. (M impact, ML generality) Use the SETCC and CMOV instructions to eliminate unpredictable conditional branches where possible. Do not do this for predictable branches. Do not use these instructions to eliminate all unpredictable conditional branches (because using these instructions will incur execution overhead due to the requirement for executing both paths of a conditional branch). In addition, converting a conditional branch to SETCC or CMOV trades off control flow dependence for data dependence and restricts the capability of the out-of-order engine. When tuning, note that all Intel 64 and IA-32 processors usually have very high branch prediction rates. Consistently mispredicted branches are generally rare. Use these instructions only if the increase in computation time is less than the expected cost of a mispredicted branch.