4

I don't understand SBC and RSC ARM instructions

I know that both deal with the carry flag (C)

I think it makes sense adding the result with the carry (ADC) like:

ADC r1, r2, r3   @ r1 = r2 + r3 + Carry 

But subtracting/reverse subtracting with the carry... I can't understand what is happening :(

Can you guys give me an example using SBC and RSC?

phuclv
  • 37,963
  • 15
  • 156
  • 475
Spacey
  • 69
  • 1
  • 4
  • `SBC` is the same logic as `ADC`, it just propagates the carry. `RSC` then is the same as `SBC` except it swaps the operands. – Jester Dec 21 '16 at 00:00
  • Thanks for the reply... but I still don't get... – Spacey Dec 21 '16 at 00:02
  • The carry will be 0 or 1... then it'll be subtracted from the registers? – Spacey Dec 21 '16 at 00:03
  • Yes, but note that for subtraction the carry flag sense is reversed (`0`=borrow `1`=no borrow). – Jester Dec 21 '16 at 00:06
  • treat it like you would for a large number add - add/adc/adc/... adds a large number. sub/sbc/sbc... subtracts. rsc just swap the two numbers to be subtracted. – Michael Dorgan Dec 21 '16 at 00:53

2 Answers2

9

Given two's complement, subtraction can just be transformed into addition:

z = y - x
  = y + (-x)
  = y + ~x + 1

which makes it easier to consider how the carry flag is set in that situation, i.e. by subs:

   z = 0 - 0
     = 0 + ffffffff + 1
 C:z = 1:00000000        // no borrow, C = 1

   z = 0 - 1
     = 0 + fffffffe + 1
 C:z = 0:ffffffff        // borrow, C = 0

Hence why the value of the C flag is nB ("not borrow"), so sbc as "subtract with carry" means "subtract with not borrow", or in other words:

z = y + ~x + C           // i.e. adc with the second operand inverted
  = y - (x - 1) - (~C + 1)
  = y - x - ~C
Notlikethat
  • 20,095
  • 3
  • 40
  • 77
  • huh, I didn't know ARM's carry flag was opposite of x86's for subtraction. x86 sets CF for `0 - 1`, and [SBB does `dest -= (SRC+CF)`](http://felixcloutier.com/x86/SBB.html). So I guess you have to be careful if porting something that uses `sbb` to get CF-based behaviour after setting the carry flag from something else. – Peter Cordes Dec 21 '16 at 23:25
0

I have come across this little oddity myself. I think it important to note that Thumb only has 2 fields for SUBS instructions so any logic using 3 fields cannot be carried out in 1 instruction.

I had an inner-loop for which the C flag would be appropriately set for an ADCS but not for an SBCS.

Thumb was designed to produce smaller compiled C so I can only presume that their is some logic but speed is not addressed AT ALL.

Along with the RORS Rd,Rs instruction format, the SBCS has me baffled. On the plus side, the fact that you can setup the bottom 8 registers and jump to a specified address makes it possible to produce very fast switch statements.

Sean
  • 1
  • *speed is not addressed AT ALL.* If code-fetch is a significant bottleneck (e.g. from slow flash), thumb can be faster. Otherwise it's not because it sometimes takes more instructions to do the same work. There's a reason AArch64 doesn't (yet) have a Thumb mode; low-performance systems can continue to use 32-bit ARM with Thumb2. – Peter Cordes Jul 27 '18 at 14:31
  • You can always use a `mov` to emulate a 3-operand instruction on a 2-operand ISA like thumb, especially with `rsb` / `rsc` vs. `sub`/`sbc` making it possible to replace either operand of a non-commutative operation. – Peter Cordes Jul 27 '18 at 14:35
  • I know Thumb was designed based on analysis of compiled C to produce smaller code. I appreciate it's benefits. I am just attempting to grasp the thought process behind the ADCS & SBCS instructions using the C flag in a different manner. If I could ask one more question. The MULS instruction returns the Z & N flags and leaves the V & C flags unmolested (before v5 it was unpredictable). Is it possible to use the bottom 32-bits of the result as-is? Was that change for a speed reason? Had the MULS mod a short-cut in mind? I can manage is 4 x 16x16-bit partials i.e. 17 cycle in-line.. am I dumb? – Sean Jul 28 '18 at 20:00
  • I'm not as familiar with ARM as x86. But don't Thumb ADCS and SBCS work exactly the same as in ARM mode, just with the restriction that the destination has to be the first source operand? So IDK what difference you're asking about. Different from how x86's CF flag works (it's a borrow for x86 `sub/sbb`, opposite of ARM's "no-borrow" meaning)? That I don't know; I learned x86 before ARM and x86's borrow semantics seem more intuitive. – Peter Cordes Jul 28 '18 at 20:10
  • Not totally sure what you're asking about with MULS either, but the low N bits of a multiply don't depend on any higher bits of either input. e.g. you can use a 32-bit multiply to do 16x16 => 16 bits without masking off the high bits of the inputs. [Which 2's complement integer operations can be used without zeroing high bits in the inputs, if only the low part of the result is wanted?](https://stackoverflow.com/q/34377711) – Peter Cordes Jul 28 '18 at 20:13
  • Sorry for not being clear. The M0 only has the MULS instruction i.e 32-bit x 32-bit --> bottom 32-bits. The N & Z flags are set depending to result but C & V aren't changed. In Tv4 and earlier the C & V are unpredictable. I'm trying to work out if their is design logic behind retaining C & V. I need 32-bit x 32-bit --->64-bit and I'm using partials i.e. four 16-bit x 16-bit multiplies with products added. I cannot use the bottom 32-bits which is really annoying. It kind of feels like the designers had something in mind but it hasn't been make clear to users. I'm most likely wrong. – Sean Jul 30 '18 at 10:53
  • And you wish it set C or V in a useful way so you could take an early out when the first 32x32 => 32-bit multiply doesn't overflow? IDK how many extra transistors it would take to correctly set C or V in a HW multiply unit that never needs to produce a correct upper half. It sounds like a valid choice to not produce useful C and V outputs, so leaving them unmodified instead of potentially polluting them with undefined garbage is strictly more useful. – Peter Cordes Jul 30 '18 at 15:00
  • Oh, I just remembered: Intel has a whitepaper on how x86's `mulx` (multiply without setting flags) allows more efficient add / add-with-carry usage for a BigInteger multiply (Table 1 in https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-large-integer-arithmetic-paper.pdf), as well as other advantages due to flexible choices of destination registers. If Cortex-M0 doesn't have `mul`, *only* `muls`, then clobbering C would maybe require an `adc` with `0` first because you couldn't keep a value in C across a multiply. – Peter Cordes Jul 30 '18 at 15:05