3

I am reading a book: [xchg rax, rax]. The following is the 0x03 snippet of the book and I can't make sense out of this one.

sub      rdx,rax
sbb      rcx,rcx
and      rcx,rdx
add      rax,rcx

I have been working on this since a week now (couple of minutes everyday). I have studied about some things trying to solve it: signed representation of numbers, how subtraction works, role of CF after a subtraction. According to this answer and the list given here. I don't see the point of checking CF after a subtraction, except in the cases of maybe overflows.

In what situations is checking the carry flag useful after a subtraction?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Aneesh Dogra
  • 740
  • 5
  • 30
  • 3
    Overflows :) Of course the code shown does not check CF as such. It uses CF to chain two 64 bit subtracts together to form a 128 bit subtraction (hence the name "carry"). – Jester Sep 24 '15 at 22:33
  • 2
    Duplicate of [Best assembly or compilation for minimum of three values](http://stackoverflow.com/questions/18306615/best-assembly-or-compilation-for-minimum-of-three-values). [This answer](http://stackoverflow.com/a/18319810/902497) gives the above code fragment as the way to do a two-value min. – Raymond Chen Sep 24 '15 at 23:37
  • there are various cases that CF is set even if there was no overflow, for example [after BZHI](http://www.felixcloutier.com/x86/BZHI.html), [COMIS\[SD\]](http://www.felixcloutier.com/x86/COMISS.html), or a shift/rotate – phuclv Oct 26 '18 at 02:58

1 Answers1

5

Actually that code is a clever branchless way to do rax = min(rax, rdx).

sub rdx, rax ; rdx = rdx - rax; CF set if rdx < rax
sbb rcx, rcx ; rcx = all 1 bits if CF was set, 0 otherwise
and rcx, rdx ; rcx = rdx - rax if CF was set, 0 otherwise
add rax, rcx ; rax = rax + (rdx - rax) = rdx if CF was set, unchanged otherwise

A more readable branching version is:

cmp rdx, rax
jnc done ; if rdx - rax produced no carry, rax is smaller or equal
mov rax, rdx ; otherwise rdx is the smaller one
done:

It's still just using CF for overflow checking.

Jester
  • 56,577
  • 4
  • 81
  • 125
  • There's no advantage to the sub/sbb/and/add version over cmp/cmov, is there? The sequence can't execute until both rdx and rax are ready, same as cmov, but it takes more uops. `adc` and `sbb` are each as expensive as `cmov` (2 uops on Intel, so there's the same complex-decode bottleneck when not running from the uop cache). Only AMD CPUs recognize `sbb` as depending only on flags, but that doesn't matter here since rdx and flags are both produced by the same instruction. So anyway, **this should perform the same as `cmp/cmov` plus 2 other instructions on the critical path**. – Peter Cordes Sep 25 '15 at 00:04