Assembly ADC (Add with carry) to C++

Question

There is an x86 assembly instruction ADC. I've found this means "Add with carry". What does this mean/do? How would one implement the behavior of this instruction in C++?

INFO:
Compiled on Windows. I'm using a 32-bit Windows Installation. My processor is Core 2 Duo from Intel.

@Chubsdad: I've done my best to add some information. I hope it is enough. — Martijn Courteaux, Nov 11 '10 at 11:26
Which compiler? To access the carry flag, you have to embed assembler code in your C++ code. How you do this depends on the compiler that you are using. — TonyK, Nov 11 '10 at 11:34

score 34 · Accepted Answer · answered Nov 11 '10 at 11:26

34

ADC is the same as ADD but adds an extra 1 if processor's carry flag is set.

answered Nov 11 '10 at 11:26

Simone

11,655
1
30
43

Alright! Thank you. But now, I have to know IF the flag is set. Is this possible in C++? – Martijn Courteaux Nov 11 '10 at 11:28
1

Not with standard C++, you have to use an "asm" code block. I don't remember the exact syntax, but you'll lose code portability. – Simone Nov 11 '10 at 11:34
3

The idea of ADC is not to know the carry flag, but to do an ADD before ADC, so the carry will be set when the ADD overflows – stefaanv Nov 11 '10 at 11:47
@Martijn, if you want to know the carry flag status you may do something like this: pushfd; pop eax; now carry flag is at bit 0 of eax. – Simone Nov 11 '10 at 11:56
1

Downvoted for not really answering the C++ aspect of the question. Also, that asm sequence kinda sucks compare to `setc al` (and `movzx eax, al` if desired). `pushf` is a 3-uop instruction on Intel SnB-family CPUs. The push/pop store-forwarding round-trip adds ~5 cycles of latency to the dependency chain involving CF. – Peter Cordes Jun 10 '16 at 01:18

score 10 · Answer 2 · edited Dec 09 '16 at 20:39

From here (broken) or here

However, Intel processor has a special instruction called adc. This command behaves similarly as the add command. The only extra thing is that it also add the value carry flag along. So, this may be very handy to add large integers. Suppose you'd like to add a 32-bit integers with 16-bit registers. How can we do that? Well, let's say that the first integer is held on the register pair DX:AX, and the second one is on BX:CX. This is how:
add  ax, cx
adc  dx, bx
Ah, so first, the lower 16-bit is added by add ax, cx. Then the higher 16-bit is added using adc instead of add. It is because: if there are overflows, the carry bit is automatically added in the higher 16-bit. So, no cumbersome checking. This method can be extended to 64 bits and so on... Note that: If the 32-bit integer addition overflows too at the higher 16-bit, the result will not be correct and the carry flag is set, e.g. Adding 5 billion to 5 billion.

Everything from here on, remember that it falls pretty much into the zone of implementation defined behavior.

Here's a small sample that works for VS 2010 (32-bit, WinXp)

Caveat: $7.4/1- "The asm declaration is conditionally-supported; its meaning is implementation-defined. [ Note: Typically it is used to pass information through the implementation to an assembler. —end note ]"

int main(){
   bool carry = false;
   int x = 0xffffffff + 0xffffffff;
   __asm {
      jc setcarry
setcarry:
      mov carry, 1
   }
}

I can't block quote the 'Caveat....' portion in my response. Sometimes this formating just doesn't behave right. — Chubsdad, Nov 11 '10 at 11:41
0xffffffff is either -1 or UINT_MAX, which is being stored in an int. Perhaps 'x' should be an unsigned int, or the summands should be INT_MAX (0x7fffffff). If we take the summands to be the same type as the result (ie, signed integer), then OVERFLOW flag is not set - the result is -2 (0xfffffffe). — jww, Jun 21 '11 at 02:15
That code is ridiculous; **you can't depend on `CF` being set or not from a C statement outside the `asm` block**. It might happen to work in debug mode, but that's not going to be useful with optimization enabled. Also, use `setc carry` to set carry to 0 or 1, according to `CF`. — Peter Cordes, Jun 10 '16 at 01:03

Peter Cordes · Answer 3 · 2020-04-23T20:04:16.240

The C++ language doesn't have any concept of a carry flag, so making an intrinsic function wrapper around the ADC instruction is clunky. However, Intel did it anyway: unsigned char _addcarry_u32 (unsigned char c_in, unsigned a, unsigned b, unsigned * out);. Last I checked, gcc did a poor job with this (saving the carry result into an integer register, instead of leaving it in CF), but hopefully Intel's own compiler does better.

See also the x86 tag wiki for assembly documentation.

The compiler will use ADC for you when adding integers wider than a single register, e.g. adding int64_t in 32bit code, or __int128_t in 64bit code.

#include <stdint.h>
#ifdef __x86_64__
__int128_t add128(__int128_t a, __int128_t b) { return a+b; }
#endif
    # clang 3.8 -O3  for x86-64, SystemV ABI.
    # __int128_t args passed in 2 regs each, and returned in rdx:rax
    add     rdi, rdx
    adc     rsi, rcx
    mov     rax, rdi
    mov     rdx, rsi
    ret

asm output from the Godbolt compiler explorer. clang's -fverbose-asm isn't very vebose, but gcc 5.3 / 6.1 wastes two mov instructions so it's less readable.

You can sometimes hand-hold compilers into emitting an adc or otherwise using the carry-out of add using the idiom uint64_t sum = a+b; / carry = sum < a;. But extending this to get a carry-out from an adc instead of add is not possible with current compilers; c+d+carry_in can wrap all the way around, and compilers don't manage to optimize the multiple checks for carry out on each + in c+d+carry if you do it safely.

Clang `_ExtInt`

There is one way I'm aware of to get a chain of add/adc/.../adc: Clang's new _ExtInt(width) feature that provides fixed-bit-width types of any size up to 16,777,215 bits (blog post). It was added to clang's development version on April 21, 2020, so it's not yet in any released version.

This will hopefully show up in ISO C and/or C++ at some point; The N2472 proposal is apparently being "being actively considered by the ISO WG14 C Language Committee"

typedef _ExtInt(256) wide_int;

wide_int add ( wide_int a, wide_int b) {
    return a+b;
}

compiles as follows with clang trunk -O2 for x86-64 (Godbolt):

add(int _ExtInt<256>, int _ExtInt<256>):
        add     rsi, r9
        adc     rdx, qword ptr [rsp + 8]
        adc     rcx, qword ptr [rsp + 16]
        mov     rax, rdi                        # return the retval pointer
        adc     r8, qword ptr [rsp + 24]        # chain of ADD / 3x ADC!

        mov     qword ptr [rdi + 8], rdx        # store results to mem
        mov     qword ptr [rdi], rsi
        mov     qword ptr [rdi + 16], rcx
        mov     qword ptr [rdi + 24], r8
        ret

Apparently _ExtInt is passed by value in integer registers until the calling convention runs out of registers. (At least in this early version; Perhaps x86-64 SysV should class it as "memory" when it's wider than 2 or maybe 3 registers, like structs larger than 16 bytes. Although moreso than structs, having it in registers is likely to be useful. Just put other args first so they're not displaced.)

The first _ExtInt arg is in R8:RCX:RDX:RSI, and the second has its low qword in R9, with the rest in memory.

A pointer to the return-value object is passed as a hidden first arg in RDI; x86-64 System V only ever returns in up to 2 integer registers (RDX:RAX) and this doesn't change that.

It's *almost* 4 years now, I still can't make the compilers (gcc, clang) work to generate `add`, `adc` instruction chain. Do you think it's possible now? — madhur4127, Apr 23 '20 at 15:32
@madhur4127: For just two instructions, yes that's possible from pure C with the `sum=a+b;` / `carry = sum — Peter Cordes, Apr 23 '20 at 18:39
@madhur4127: update: clang `_ExtInt` can let you use a fixed-width integer type of any width up to 16,777,215 bits. (http://blog.llvm.org/2020/04/the-new-clang-extint-feature-provides.html). https://godbolt.org/z/bsDCvh shows that `+` on `_ExtInt(256)` compiles to `add` / `adc` / `adc` / `adc`. — Peter Cordes, Apr 23 '20 at 19:46
The current version of ExtInt just unrolls everything, I see 4000+ lines of assembly with 2048*2048 multiplication without any loops. At (1<<20) bits, compiler explorer killed the process because of timeout. — madhur4127, Apr 24 '20 at 05:09
@madhur4127: lol, that's amusing, thanks for checking on that. So not currently practical for very large integers, especially for multiply. — Peter Cordes, Apr 24 '20 at 05:13

score 9 · Answer 4 · answered Nov 11 '10 at 12:01

9

The ADC behaviour can be simulated in both C and C++. The following example adds two numbers (stored as arrays of unsigned as they are too large to fit into a single unsigned).

unsigned first[10];
unsigned second[10];
unsigned result[11];

....   /* first and second get defined */

unsigned carry = 0;
for (i = 0; i < 10; i++) {
    result[i] = first[i] + second[i] + carry;
    carry = (first[i] > result[i]);
}
result[10] = carry;

Hope this helps.

answered Nov 11 '10 at 12:01

Sparky

13,505
4
26
27

5

This code will fail to set the carry if `second==~0U && carry==1`. e.g.: with 32bits unsigned that would be `second[i]==0xFFFFFFFF && carry==1`. In this case `first[i] == result[i]` even though an overflow (carry) has happened. – Stephane Hockenhull Oct 04 '15 at 21:58
1

Working code is `unsigned tmp = second[i] + carry; result[i] = first[i] + tmp; carry = (first[i] > result[i]) | (second[i] > tmp);` – Stephane Hockenhull Oct 04 '15 at 22:56
1

And the fact this is maddenly slow is why it's written in assembly right now. – Joshua Dec 10 '16 at 17:33

Řrřola · Answer 5 · 2023-03-16T17:14:37.623

In x86-64, the ADD instruction adds two 64-bit integers: add rax, rbx does rax = rax + rbx.
It also sets the carry flag to 1 when there was unsigned overflow (= when the result didn't fit in 64 bits), otherwise it sets the carry flag to 0.

In C++, you can simulate ADD like this:

uint64_t a, b;
bool carry;

a += b;
carry = (a < b);  // a+b can't be smaller than b: there must have been an overflow

The ADC instruction is like ADD, but adds the carry flag to the result:
adc rax, rbx does rax = rax + rbx + carry_flag.
It also sets the carry flag if there was unsigned overflow.

In C++:

uint64_t tmp = b + carry;
a += tmp;
carry = (tmp < carry) + (a < tmp);  // only one overflow can happen

The ADD and ADC instructions can be used to add big integers (with n "digits").
Use ADD for the least significant digits, then use ADC (n – 1) times to add the rest of the digits.
This is the “schoolbook addition algorithm”.

For example, adding 256-bit big integers with four 64-bit "digits":

mov rax, [rsi]        ; load the least significant source digit
mov rbx, [rsi + 8]    ; ...
mov rcx, [rsi + 16]
mov rdx, [rsi + 24]
add [rdi], rax        ; add it to the least significant destination digit
adc [rdi + 8], rbx    ; ... propagate carry up
adc [rdi + 16], rcx
adc [rdi + 24], rdx

Recent versions of the clang compiler can recognize big integer addition and use ADD/ADC to implement it.

constexpr uint64_t n = 4;
uint64_t dst[n], src[n];

// Add src to dst.
uint64_t carry = 0;
for (int i = 0; i < n; i++) {
  uint64_t tmp = src[i] + carry;
  dst[i] += tmp;
  carry = (tmp < carry) + (dst[i] < tmp);
}

Very cool, finally a way to write this in pure ISO C++ that compiles to a chain of `adc` instructions with both carry-in and carry-out (with Clang 9 and later; clang 8 and earlier miss the peephole optimization and materialize `carry` as a 0/1 with `setc` aka `setb`.) — Peter Cordes, Mar 16 '23 at 18:43

score 3 · Answer 6 · answered Jan 20 '11 at 03:42

3

There is a bug in this. Try this input:

unsigned first[10] =  {0x00000001};
unsigned second[10] = {0xffffffff, 0xffffffff};

The result should be {0, 0, 1, ...} but the result is {0, 0, 0, ...}

Changing this line:

carry = (first[i] > result[i]);

to this:

if (carry)
    carry = (first[i] >= result[i]);
else
    carry = (first[i] > result[i]);

fixes it.

answered Jan 20 '11 at 03:42

Oshkosher

49
1
2

2

`carry=(carry&&first[i]>=result[i])||(!carry&&first[i]>result[i])` avoids branching and does the same thing, if anyone is interested. – Hassedev Jun 15 '13 at 11:08
4

Actually `||` and `&&` causes branching since they only evaluate the right side as necessary. There are more branching in the one-liner than with the easy-to-read if() statement. – Stephane Hockenhull Oct 04 '15 at 22:06

Loge · Answer 7 · 2023-08-15T17:50:09.040

1

This is my fastest Code:

template <class Ty>
constexpr bool add_carry_ux(bool carry_in, Ty src1, Ty src2, Ty* sum_out)
{
    const Ty sum = src1 + src2;
    const bool carry_out = (sum < src1) | ((sum == ~static_cast<Ty>(0)) & carry_in);

    *sum_out = sum + carry_in;

    return carry_out;
}

ASM:

add_carry_ux(bool, unsigned long, unsigned long, unsigned long*):
        add     rsi, rdx
        movzx   eax, dil
        setc    dl
        add     rax, rsi
        cmp     rsi, -1
        mov     QWORD PTR [rcx], rax
        sete    al
        movzx   edx, dl
        and     eax, edi
        or      eax, edx
        ret

edited Aug 15 '23 at 17:50

answered Jul 24 '23 at 01:08

Loge

13
4

1

[Another answer](https://stackoverflow.com/questions/4153852/assembly-adc-add-with-carry-to-c/75759422#75759422) on this question shows code that can compile to a single `adc` with recent clang. Does this do any better when it inlines so the carry_in and carry_out can be in CF? Because this looks like a lot of asm. – Peter Cordes Aug 15 '23 at 17:55

Serg Stetsuk · Answer 8 · 2021-04-15T05:45:35.237

0

int32_t adc(uint32_t first, uint32_t second, uint32_t *carry)
    {
        uint32_t res;
        uint32_t carry_out = 0;
    
        if (!*carry)
        {
            res = first + second;
            *carry = (res < first) && (res < second);
    
            return res;
        }
    
        res = adc(first, second, &carry_out);
        if (*carry)
        {
            res++;
            carry_out |= !res;
        }
        
        *carry = carry_out;
    
        return res;
    }

edited Apr 15 '21 at 05:45

answered Feb 17 '17 at 09:27

Serg Stetsuk

419
4
8

Building a 16-bit add-with-carry out of wider operations kind of defeats the purpose. Just right-shift `result >> 16` to get the high half (carry out), or just *use* that wider type directly and let the compiler implement it with `adc` or whatever is efficient on the target ISA. – Peter Cordes Feb 27 '21 at 11:41
1

Also, if `unsigned int` actually is a 16-bit type like you seem to be assuming, `first + second` already wraps before you assign to `result`. Perhaps you meant `first + (unsigned long)second`? – Peter Cordes Feb 27 '21 at 11:41
1

You can [edit] your answer to replace bad code with good code. But note that the hard part of implementing ADC in pure C is handling carry-in. Your `result < second` is sufficient to detect carry-out, but you shouldn't add it to *this* element, you should return it separately. That's like `add eax, ecx` / `adc eax, 0` which is almost never what you want. – Peter Cordes Apr 15 '21 at 03:37
1

I think this new version fails for cases like `0xffffffff + 0 + carry=1`. The while loop does `0xffffffff + 1 = 0`, producing a carry-out, which makes the while loop run again producing res=1 and carry=0. Then it tailcalls itself to do `1 + 0`, returning 1 and leaving carry=0, when the correct result is `0` with carry=1. Carry-out in either `+` operation in `first + second + *carry` needs to send a carry-out to the final `*carry` output, *not* add back into the return value. Remember, carry-out is bit 33 of a 32+32 => 33-bit addition, but carry-in has a place value of just 1. – Peter Cordes Apr 15 '21 at 04:28

Assembly ADC (Add with carry) to C++

8 Answers8

Clang `_ExtInt`

Linked

Assembly ADC (Add with carry) to C++

8 Answers8

Clang _ExtInt

Linked

Clang `_ExtInt`