Using inline assembler [gcc, intel, c], how to check if the carry flag is set after an operation?
-
You want to test this within a block of asm or you want to pass the state of the carry flag back to something in the C code in which your asm is inlined ? – Paul R Jun 29 '10 at 10:28
-
testing within a block of asm is sufficiant. passing it out should not be that hard. – hans Jun 29 '10 at 10:33
-
Related: [Read flag register from C program](https://stackoverflow.com/a/56237860) - you can *output* flags from asm to C (e.g. with GCC6 flag-output syntax: [Using condition flags as GNU C inline asm outputs](https://stackoverflow.com/q/30314907)), but you can't have asm *read* a FLAGS input. A `+` or `<<` operation in C does not have any well-defined carry-out, and might compile to an LEA or something that doesn't touch flags. Or be optimized away. – Peter Cordes Jan 04 '22 at 09:26
5 Answers
sbb %eax,%eax
will store -1 in eax if the carry flag is set, 0 if it is clear. There's no need to pre-clear eax to 0; subtracting eax from itself does that for you. This technique can be very powerful since you can use the result as a bitmask to modify the results of computations in place of using conditional jumps.
You should be aware that it is only valid to test the carry flag if it was set by arithmetic performed INSIDE the inline asm block. You can't test carry of a computation that was performed in C code because there are all sorts of ways the compiler could optimize/reorder things that would clobber the carry flag.

- 208,859
- 35
- 376
- 711
-
[Using condition flags as GNU C inline asm outputs](https://stackoverflow.com/q/30314907) shows the new GCC6 `"=@ccc" (cf_output)` syntax for declaring CF or other flag-conditions as outputs to the compiler directly, instead of having to materialize a boolean or int in a register inside your asm template. Re: your warning against trying to read FLAGS set by compiler-generated instructions: [Read flag register from C program](https://stackoverflow.com/a/56237860) has more about that. – Peter Cordes Jan 04 '22 at 09:30
With conditional jumps jc
(jump if carry) or jnc
(jump if not carry).
Or you can store the carry flag,
;; Intel syntax
mov eax, 0
adc eax, 0 ; add with carry

- 42,588
- 16
- 104
- 136
-
`setc al` is the normal way to materialize a FLAGS condition into an integer. (With EAX zeroed ahead of time if you want it to be 32-bit). Or better, with modern compilers, GCC6 has syntax to tell the compiler about FLAGS so you don't have to waste instructions making an integer that compiler-generated code will just `test`. [Using condition flags as GNU C inline asm outputs](https://stackoverflow.com/q/30314907) – Peter Cordes Jan 04 '22 at 09:28
However the x86 assembler hes dedicated fast ALU flag test instructions named SETcc where the cc is desired ALU flag. So you can write:
setc AL //will set AL register to 1 or clear to 0 depend on carry flag
or
setc byte ptr [edx] //will set memory byte on location edx depend on carry flag
or even
setc byte ptr [CarryFlagTestByte] //will set memory variable on location CarryFlagTestByte depend on carry flag
With SETcc instruction you can test flags like carry, zero, sign, overflow or parity, some SETcc instructions allow to test two flags at once.
EDIT: Added simple test made in Delphi to disappear a doubt about term fast
procedure TfrmTest.ButtonTestClick(Sender: TObject);
function GetCPUTimeStamp: int64;
asm
rdtsc
end;
var
ii, i: int64;
begin
i := GetCPUTimeStamp;
asm
mov ecx, 1000000
@repeat:
mov al, 0
adc al, 0
mov al, 0
adc al, 0
mov al, 0
adc al, 0
mov al, 0
adc al, 0
loop @repeat
end;
i := GetCPUTimeStamp - i;
ii := GetCPUTimeStamp;
asm
mov ecx, 1000000
@repeat:
setc al
setc al
setc al
setc al
loop @repeat
end;
ii := GetCPUTimeStamp - ii;
caption := IntToStr(i) + ' ' + IntToStr(ii));
end;
The loop (1M iterations) wich using instruction setc is more than 5 times faster than loop with adc instriuction.
EDIT: Added second test which test result stored in register AL comulative in register CL to be more realistic case.
procedure TfrmTestOtlContainers.Button1Click(Sender: TObject);
function GetCPUTimeStamp: int64;
asm
rdtsc
end;
var
ii, i: int64;
begin
i := GetCPUTimeStamp;
asm
xor ecx, ecx
mov edx, $AAAAAAAA
shl edx, 1
mov al, 0
adc al, 0
add cl, al
shl edx, 1
mov al, 0
adc al, 0
add cl, al
shl edx, 1
mov al, 0
adc al, 0
add cl, al
shl edx, 1
mov al, 0
adc al, 0
add cl, al
shl edx, 1
mov al, 0
adc al, 0
add cl, al
shl edx, 1
mov al, 0
adc al, 0
add cl, al
shl edx, 1
mov al, 0
adc al, 0
add cl, al
shl edx, 1
mov al, 0
adc al, 0
add cl, al
end;
i := GetCPUTimeStamp - i;
ii := GetCPUTimeStamp;
asm
xor ecx, ecx
mov edx, $AAAAAAAA
shl edx, 1
setc al
add cl, al
shl edx, 1
setc al
add cl, al
shl edx, 1
setc al
add cl, al
shl edx, 1
setc al
add cl, al
shl edx, 1
setc al
add cl, al
shl edx, 1
setc al
add cl, al
shl edx, 1
setc al
add cl, al
shl edx, 1
setc al
add cl, al
end;
ii := GetCPUTimeStamp - ii;
caption := IntToStr(i) + ' ' + IntToStr(ii);
end;
Rutine part with SETcc instruction is still faster for about 20%.

- 10,810
- 2
- 45
- 62
-
Do you have a citation for calling them fast? I haven't kept up with the latest cpus for the past few generations, but for a long time, these were considered slow legacy opcodes. – R.. GitHub STOP HELPING ICE Jun 29 '10 at 23:25
-
1@R.. Yes the SETcc is old instruction, but is much faster than ADC or using cinditional jumps like JC or JNC. – GJ. Jun 30 '10 at 07:51
-
The above tests are meaningless. (1), the results are not even used so any latency will be masked. And (2), the non-SETcc version is using mov/adc rather than sbb which is something like 6x larger in bytes and double the opcode count. If you want to benchmark this you should try both versions with code that actually makes use of the result for a computation. – R.. GitHub STOP HELPING ICE Jun 30 '10 at 09:45
-
@R.. Added second test! Rutine part with SETcc instruction is still faster for about 20%. – GJ. Jun 30 '10 at 11:00
-
You're still using mov/adc instead of sbb. Try it with sbb then subtracting the result (which will be 0 or -1, and thus will add 0 or 1 when you substract it) from the "accumulator" (CL). Also make sure you're using the shortest-form instructions - I don't remember off-hand but I seem to remember sbb potentially being shorter depending on which register or which register size you use. And tighter loops may run faster due to cache issues. – R.. GitHub STOP HELPING ICE Jul 03 '10 at 17:21
-
OK, in that case is faster. But normaly we test carry flag to return boolean result in AL when we exit function. For that purpose was SETxx instruction made! Remember instruction ADC is older than SETxx. – GJ. Jul 03 '10 at 19:38
-
@R.: [Agner Fog's instruction tables](http://agner.org/optimize/) show setcc as single-uop, 2 per clock throughput, with one cycle latency. So they're fast. The only problem is that they only write the low 8 bits of a register, so you need to `xor reg,reg` first, or `movzx eax, al` after. movzx after avoids partial-register slowdowns on some Intel CPUs. xor before avoids false dependencies on whatever last wrote the eax (on non-Intel), and you can pull it out of a loop. `sbb` on Intel SnB-family decodes to 2uops with 2 cycle latency. – Peter Cordes Nov 19 '15 at 11:47
The first function performs unsigned addition and then tests for overflow using the carry flag (CF). The volatile's must remain. Otherwise the optimizer will rearrange instructions, which pretty much ensures an incorrect result. I've seen the optimizer change the jnc
to a jae
(which is also based on CF).
/* Performs r = a + b, returns 1 if the result is safe (no overflow), 0 otherwise */
int add_u32(uint32_t a, uint32_t b, uint32_t* r)
{
volatile int no_carry = 1;
volatile uint32_t result = a + b;
asm volatile
(
"jnc 1f ;"
"movl $0, %[xc] ;"
"1: ;"
: [xc] "=m" (no_carry)
);
if(r)
*r = result;
return no_carry;
}
The next function is for the signed ints. Same use of volatile applies. Note that signed integer math jumps on OF flag via jno
. I've seen the optimizer change this to a jnb
(which is also based on OF).
/* Performs r = a + b, returns 1 if the result is safe (no overflow), 0 otherwise */
int add_i32(int32_t a, int32_t b, int32_t* r)
{
volatile int no_overflow = 1;
volatile int32_t result = a + b;
asm volatile
(
"jno 1f ;"
"movl $0, %[xo] ;"
"1: ;"
: [xo] "=m" (no_overflow)
);
if(r)
*r = result;
return no_overflow;
}
In the big picture, you might use the functions as follows. In the same big picture, many folks will probably reject the extra work and aesthetic non-beauty until pwn'd by an overflow/wrap/underflow
int r, a, b;
...
if(!add_i32(a, b, &r))
abort(); // Integer overflow!!!
...
The inline GCC assembly is available in GCC 3.1 and above. See Assembler Instructions with C Expression Operands, or search for 'GCC Extended Assembly'.
Finally, the same in Visual Studio would be as follows (not much difference in code generation), but syntax is much easier since MASM allows you to jump to a C label:
/* Performs r = a + b, returns 1 if the result is safe (no overflow), 0 otherwise */
int add_i32(__int32 a, __int32 b, __int32* r)
{
volatile int no_overflow = 1;
volatile __int32 result = a + b;
__asm
{
jno NO_OVERFLOW;
mov no_overflow, 0;
NO_OVERFLOW:
}
if(r)
*r = result;
return no_overflow;
}
On the bad side, the above MASM code is only applicable for x86 assembly. For x64 assembly, there is no inlining so you will have to code it up in assembly (in a separate file) and use use MASM64 to compile.

- 97,681
- 90
- 411
- 885
-
You can use the goto extension to jump to a C label using e.g. asm volatile goto( "ja %l[clabel]" : : : "memory" : clabel );, where clabel is C label – Jens Munk Mar 23 '14 at 18:42
-
Reading bellow the answer by @R.. seems to invalidate your function. `You should be aware that it is only valid to test the carry flag if it was set by arithmetic performed INSIDE the inline asm block.` Are you sure about this? – DrBeco Jul 24 '15 at 00:51
-
@DrBeco - yeah, for GCC, it depends. GCC will ensure "consecutiveness" of instructions in your block, but it may insert/interleave its own instructions. If GCC instructions do not modify CC, then everything will be OK. Microsoft's inline assembler does not suffer GCC's limitations. – jww Oct 14 '15 at 03:27
-
There is no point to write such amount of code with volatiles, the compiler is unable to optimize anything out of it. Better write plain C code like r = a + b; carry = (r < a) ; which the compiler can optimize. There are plenty similar C variants for signed, unsigned, operands of different sizes .... – Pierre Apr 14 '17 at 17:29
This may give an idea or solution if it's correct. I struggled with testing for wrap around until I found out about in-line assembly. I tried to test with various edge values and seems to work correctly. Program takes input from cmdln and converts it to integer and outputs hex and binary values.
gcc version 11.2.1
$> gcc -Wall -std=c99 -O2 -o uilt uilt.c
snippet:
size_t i = 0;
int mul = 10;
uint128_t sum = 0;
int int_array[48] = {0};
// fill arr. with ea. str val in argv[1] str. converted to int vals.
while (i < strlen(argv[1])) {
// chk they are digit chars, if not, skip iter
if (isdigit(argv[1][i]) == 0) {
i++;
continue;
}
int_array[i] = (argv[1][i] - 48);
sum = int_array[i] + (sum * mul);
/* check carry flag */
__asm__ goto("jc %l0"
: /* no outputs */
: /* no inputs */
: /* no clobbers */
: carry);
/* no carry */
goto its_good;
carry:
system("clear");
printf("\n\n\tERROR!!!\
\n\n\t!!!!!!! uilt has ABORTED !!!!!!\
\n\tCmdln arg exceeds 2^127 bit limit\
\n\twhen converted from string to 127\
\n\tbit unsigned __int128.\n\n");
exit(1);
its_good:
i++;
}
some output:
[jim@nitroII uiltDev]$ ./uilt 1
Dec: 1
Hex: 0x0001
Bin: 0x0001
[jim@nitroII uiltDev]$ ./uilt 255
Dec: 255
Hex: 0x00ff
Bin: 0x0000 1111 1111
[jim@nitroII uiltDev]$ ./uilt 18446744073709551616
Dec: 18446744073709551616
Hex: 0x0001 0000 0000 0000 0000
Bin: 0x0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
[jim@nitroII uiltDev]$ ./uilt 340282366920938463463374607431768211455
Dec: 340282366920938463463374607431768211455
Hex: 0x0000 ffff ffff ffff ffff ffff ffff ffff ffff
Bin: 0x0000 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111
Dec: 340282366920938463463374607431768211456
ERROR!!!
!!!!!!! uilt has ABORTED !!!!!!
Cmdln arg exceeds 2^127 bit limit
when converted from string to 127
bit unsigned __int128.
-
Unfortunately this is *not* safe in general, it just happened to work. There's no way to tell GCC that CF from the last `+` in the C is an input to the asm statement. If that's the case, it's only by luck. GCC could easily have used LEA, or optimized with SIMD, or some other thing that doesn't involve an x86 `add` instruction. Or it might have run some other instruction before your asm statement, like `add rsp, 16` if there was clean-up from a function call with a lot of args. (Or only a few if 32-bit mode). – Peter Cordes Jan 04 '22 at 08:57
-
Also, your overflow-detection algorithm is broken even when it happens to compile so that you're reading CF from the `+` in `sum = sum*10 + digit`. If the `sum*10` part wraps, it will produce a value far from UINT128_MAX, so most overflowing inputs won't be detected. Your example works because you only overflow in the last digit. Try `340282366920938463463374607431768211465` (changing the 2nd-last digit to 6 instead of the last). It should wrap and produce `46` without detecting overflow. Even doing the addition inside the asm wouldn't help for that; you'd have to check `mul` or `shl`... – Peter Cordes Jan 04 '22 at 09:03
-
Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 04 '22 at 09:08
-
I was playing around with optimizing glibc's `read_int` function (which it uses for parsing printf / scanf format strings like `%80s`), since the original code uses quite a few extra instructions inside the loop for error checking. https://github.com/lattera/glibc/blob/895ef79e04a953cac1493863bcae29ad85657ee1/stdio-common/printf-parse.h#L70. My work-in-progress ideas include checking the length after finding the first non-digit: if it was more base-10 digits than INT_MAX, it definitely wrapped. If not, check the leading digit. (I use unsigned no-UB wrapping, but only have to return an int) – Peter Cordes Jan 04 '22 at 09:17
-
Anyway, my work-in-progress: https://godbolt.org/z/hfhz79cEK. The same ideas should be applicable to int128 conversion. – Peter Cordes Jan 04 '22 at 09:18
-
To test if a single unsigned addition wrapped in pure C, you can do `unsigned sum = a+b;` ; `if (sum < a) there_was_carry_out;`. The same `carry = (sum < a)` idiom is useful if writing extended-precision stuff in pure C, except it's hard to get compilers to make efficient asm for that. The `sum < a` idiom often does compile to a `jnc` or `setc` or whatever. (Hopefully even with `unsigned __int128`. – Peter Cordes Jan 04 '22 at 09:22
-
See also [Read flag register from C program](https://stackoverflow.com/a/56237860) – Peter Cordes Jan 04 '22 at 09:23
-
1@peter, Thank you for all the info. I will try to fix it by doing the suggested reading and implementing the if(sum < a). You are absolutely correct, I just changed the last digit or – scurvydog Jan 04 '22 at 14:05
-
@peter, In your comment #3 you write about checking the leading digit. Did you mean 'leading (or most significant) bit'? I'm trying to in-line a bsr to do that. Maybe a bit mask? shr and bit mask? – scurvydog Jan 04 '22 at 15:58
-
I mean most significant *decimal* digit. e.g. a number matching `[1-2]xxxxxxxxx` overflows an `int32_t`, but won't have wrapped a `uint32_t`. Not sure if that's applicable to parsing unsigned integers without an even wider type. Maybe check `<'4'` and you're safe, or `=='4' && sum >= 4000000000` is also non-wrapped, for length = 10 decimal digits. (This happens outside the loop, and only for max-length numbers. Smaller numbers early-out on their length being `<10`. If you save the start pointer, you get the string length in decimal digits for essentially free.) – Peter Cordes Jan 04 '22 at 21:48
-
Except there are TODO comments in my work-in-progress about handling the case of leading zeros, which throws off the length calc since we actually need the most significant non-zero, if it's long. (`00123` is always fine, so the `len < 10` check can still pass it). Otherwise, like memchr except matching on != 0, like you'd do with `_mm_cmpeq_epi8` / `_mm_movemask_epi8` (and loop if all ones) / `~mask` / `bsf` or `tzcnt` to find the first non `'0'` position. – Peter Cordes Jan 04 '22 at 21:49