27

A very common pattern in programming is to cap a value at a maximum after some kind of update. What I'd like to know, is if there's a difference between the following two pieces of code, and if one should be preferred:

value += increment;
value = std::min(value, valueMax);

vs

value += increment;

if (value > valueMax)
    value = valueMax;

My thinking is that this comes down to whether CPUs have instructions for taking two values and producing the minumum. If so, the call to std::min should result in this instruction and avoid an unnecessary branch. If not, the second version avoids an unnecessary assignment when value <= valueMax.

I'm not very good with this kind of thing, but I'm sure there's old-school assembly hackers who would know this. To them I ask: which is better?

Jens Björnhager
  • 5,632
  • 3
  • 27
  • 47
voltrevo
  • 9,870
  • 3
  • 28
  • 33
  • 2
    Try both and look at the assembly... – Mysticial Mar 21 '13 at 05:37
  • I'd say the first version will always perform *at least* as well as the second version, so there's no reason not to use it. The first version might *also* be faster, although there's no guarantees about that. – Cody Gray - on strike Mar 21 '13 at 05:39
  • Like Mysticial implied, it depends on the implementation of std::min (http://en.cppreference.com/w/cpp/algorithm/min). – DavidA Mar 21 '13 at 05:39
  • @CodyGray, can you explain how first version will be faster. The first version will always have an else branch (i.e. more code size) without any compiler optimization. – A. K. Mar 21 '13 at 05:42
  • There is indeed an instruction for the minimum of two words (`PMINSW`), but it's a SSE instruction. Who knows what (if any) compilers actually optimize to that. If you're determined to do it in one instruction and know for sure that it's done in one instruction, you'll need to drop down to assembly. – Corbin Mar 21 '13 at 05:43
  • I think this question is a bit dicey, because "better" is an imprecise term. In this case, it sounds like you mean "faster". But I wouldn't try to outsmart your compiler with premature optimization unless performance is absolutely critical -- which isn't the case 99% of the time. If it is, do as @Mysticial says and the answer will be clear (for your particular compiler/architecture). – jmk Mar 21 '13 at 05:44
  • @Mystical: I guess I hadn't really considered the possibility that the assembly might be the same. So, assuming they were different, I wouldn't be able to tell what was *really* different with my extremely limited knowledge of assembly. – voltrevo Mar 21 '13 at 05:50
  • @Aditya [A branch is not required](http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax). Granted, this is not a sensible implementation unless you know that the target architecture is *extremely* sensitive to branches, but it's possible that you're working with a version of the library that has been thus optimized. – Cody Gray - on strike Mar 21 '13 at 05:52
  • @Mozza314 In which case, the best you could do is to (try to) post the assembly (even if you can't read it). Or put together a benchmark to see if you can see a difference. In your case, I'd say anything can happen. Not all compilers are equally smart/dumb about optimizing branches. – Mysticial Mar 21 '13 at 05:55
  • @CodyGray, In that version where branch is not required, again, the code size is more than the second version. I believe that second version will always at least as fast as the first version and not the other way round. – A. K. Mar 21 '13 at 05:59
  • @Aditya On an architecture where branches are extremely expensive and to be avoided at all costs, the additional code size is not relevant. It's still faster because it avoids branching. On other architectures where branches are less expensive, you're right, the code size is the relevant factor. The point is that the standard library function can be selectively optimized for the target architecture, where as the one you write yourself is not. – Cody Gray - on strike Mar 21 '13 at 16:50

4 Answers4

17

Modern compilers are smart enough to generate the same code in both cases. For example, 32-bit GCC generates:

addl    %esi, %edi
cmpl    %edx, %edi
movl    %edi, %eax
cmovgl  %edx, %eax

64-bit Clang:

%1 = add nsw i32 %increment, %value
%2 = icmp sgt i32 %1, %valueMax
%value = select i1 %2, i32 %valueMax, i32 %1
Maxim Razin
  • 9,114
  • 7
  • 34
  • 33
6

On VC10 on Release for the following code we have the following assembly:

int main(int argc, char *argv[])
{ 
  int dummyValue = 0, valueMax = 3000, value = valueMax + 1;

  cin >> valueMax;
  cin >> value;

  dummyValue = std::min(value, valueMax);

  cout << dummyValue;
  cin >> valueMax;
  cin >> value;

  if (value > valueMax)
    dummyValue = valueMax;

  cout << dummyValue;
  return 0;
}

Generated:

  24:   dummyValue = std::min(value, valueMax);
00E112AF  mov         eax,dword ptr [valueMax]  
00E112B2  cmp         eax,dword ptr [value]  
00E112B5  lea         edx,[value]  
00E112B8  lea         ecx,[valueMax]  
00E112BB  cmovge      ecx,edx     // <-- this is our conditional assignment
00E112BE  mov         esi,dword ptr [ecx]  

and

if (value > valueMax)
  dummyValue = valueMax
00E112ED  mov         eax,dword ptr [valueMax]  
00E112F0  cmp         dword ptr [value],eax  
00E112F3  mov         ecx,dword ptr ds:[0E13038h]  
00E112F9  cmovg       esi,eax  

So both cases optimized to either cmovge or cmovg instructions.

I would still go with std::min because it shows intent better than an if statement. It's optimized away and it's more readable.

Ed Rowlett-Barbu
  • 1,611
  • 10
  • 27
  • When you provide constant values to variables for simple algorithms like min (which is inlined most of the time). The compiler can figure out, at compile time, the minimum value. THe loop will not be executed so you are getting `0`. Try generating values passed to min at run-time and you will get proper measurement. – A. K. Mar 21 '13 at 06:04
  • In any case, your test is kinda broken in several ways. 1) It does indeed do nothing. So the compiler is free to optimize out both those loops. That can be fixed by using the result of the loops. 2) The entire first loop would be optimized to `dummyValue = 3001`. And the entire second loop would be optimized to `value = 3001`. – Mysticial Mar 21 '13 at 06:04
0

The answer depends on the type of value. The code could be effectively optimized if all operations are fully transparent to the code optimizer, which would be the case if value is a plain integer. But your code would also compile if value is a std::string, and then the second version might be potentially faster since the assignment is conditional.

smossen
  • 887
  • 6
  • 11
-2

This should generally be much faster than branching:

int max(int x, int y) 
    { 
        return x ^ ((x ^ y) & -(x < y));  
    } 
};
Feras
  • 2,114
  • 3
  • 20
  • 42
  • 4
    -1 Modern compilers regularly replace simple branches with CMOV (conditional move) instructions, so while clever, this is rather complex, limited to integers, and the performance improvement is questionable. – Tobias Ribizel Mar 14 '20 at 06:51