Subtract and detect underflow, most efficient way? (x86/64 with GCC)

Question

I'm using GCC 4.8.1 to compile C code and I need to detect if underflow occurs in a subtraction on x86/64 architecture. Both are UNSIGNED. I know in assembly is very easy, but I'm wondering if I can do it in C code and have GCC optimize it in a way, cause I can't find it. This is a very used function (or lowlevel, is that the term?) so I need it to be efficient, but GCC seems to be too dumb to recognize this simple operation? I tried so many ways to give it hints in C, but it always uses two registers instead of just a sub and a conditional jump. And to be honest I get annoyed seeing such stupid code written so MANY times (function is called a lot).

My best approach in C seemed to be the following:

if((a-=b)+b < b) {
  // underflow here
}

Basically, subtract b from a, and if result underflows detect it and do some conditional processing (which is unrelated to a's value, for example, it brings an error, etc).

GCC seems too dumb to reduce the above to just a sub and a conditional jump, and believe me I tried so many ways to do it in C code, and tried alot of command line options (-O3 and -Os included of course). What GCC does is something like this (Intel syntax assembly):

mov rax, rcx  ; 'a' is in rcx
sub rcx, rdx  ; 'b' is in rdx
cmp rax, rdx  ; useless comparison since sub already sets flags
jc underflow

Needless to say the above is stupid, when all it needs is this:

sub rcx, rdx
jc underflow

This is so annoying because GCC does understand that sub modifies flags that way, since if I typecast it into a "int" it will generate the exact above except it uses "js" which is jump with sign, instead of carry, which will not work if the unsigned values difference is high enough to have the high bit set. Nevertheless it shows it is aware of the sub instruction affecting those flags.

Now, maybe I should give up on trying to make GCC optimize this properly and do it with inline assembly which I have no problems with. Unfortunately, this requires "asm goto" because I need a conditional JUMP, and asm goto is not very efficient with an output because it's volatile.

I tried something but I have no idea if it is "safe" to use or not. asm goto can't have outputs for some reason. I do not want to make it flush all registers to memory, that would kill the entire point I'm doing this which is efficiency. But if I use empty asm statements with outputs set to the 'a' variable before and after it, will that work and is it safe? Here's my macro:

#define subchk(a,b,g) { typeof(a) _a=a; \
  asm("":"+rm"(_a)::"cc"); \
  asm goto("sub %1,%0;jc %l2"::"r,m,r"(_a),"r,r,m"(b):"cc":g); \
  asm("":"+rm"(_a)::"cc"); }

and using it like this:

subchk(a,b,underflow)
// normal code with no underflow
// ...

underflow:
  // underflow occured here

It's a bit ugly but it works just fine. On my test scenario, it compiles just FINE without volatile overhead (flushing registers to memory) without generating anything bad, and it seems it works ok, however this is just a limited test, I can't possibly test this everywhere I use this function/macro as I said it is used A LOT, so I'd like to know if someone is knowledgeable, is there something unsafe about the above construct?

Particularly, the value of 'a' is NOT NEEDED if underflow occurs, so with that in mind are there any side effects or unsafe stuff that can happen with my inline asm macro? If not I'll use it without problems till they optimize the compiler so I can replace it back after I guess.

Please don't turn this into a debate about premature optimizations or what not, stay on topic of the question, I'm fully aware of that, so thank you.

I think the problem is you assume that compilers always optimize to the "best" way of doing something, the flaw is your assumption not the compiler nor optimizer. — old_timer, Jul 25 '14 at 15:17
The `if ((r = x - y) > x)` pattern is better. Actually, that's one of the answers below. — Brett Hale, Jul 25 '14 at 16:11
It's because I want to subtract b from a first, if I do (a < b) it won't subtract b from a, and again, it will do a comparison (a — kktsuri, Jul 25 '14 at 18:55
Actually, now i'm curious, if the asm goto is safe or not the way I did it, even if it's useless now due to Jester's answer for this particular case. I'd like to know how it works and how or why it is not safe when I did it, if anyone knows; this isn't for a practical usage right now, just my curiosity, I'm sure at some point I'll make use of that knowledge. — kktsuri, Jul 25 '14 at 18:59
[How to detect integer overflow in C/C++?](http://stackoverflow.com/q/199333/995714), [Checking for underflow/overflow in C++?](http://stackoverflow.com/q/2399269/995714) — phuclv, Mar 31 '16 at 06:53

score 4 · Accepted Answer · answered Jul 25 '14 at 15:34

4

I probably miss something obvious, but why isn't this good?

extern void underflow(void) __attribute__((noreturn));
unsigned foo(unsigned a, unsigned b)
{
    unsigned r = a - b;
    if (r > a)
    {
        underflow();
    }
    return r;
}

I have checked, gcc optimizes it to what you want:

foo:
    movl    %edi, %eax
    subl    %esi, %eax
    jb      .L6
    rep
    ret
.L6:
    pushq   %rax
    call    underflow

Of course you can handle underflow however you want, I have just done this to keep the asm simple.

answered Jul 25 '14 at 15:34

Jester

56,577
4
81
125

Thanks so much, it seems GCC is smart enough to optimize only this particular case of "(a-b) > a", but not what I tried with a < b. But yes, this works, thank you. Now I use a macro like this: #define subchk(a,b) ({ typeof(a) _a=a; a-=b; a > _a; }) Then I just use if(subchk(A, B)) { underflow; }. I use macro because I want it to be agnostic to types. For some reason, GCC sometimes in my tests now uses "mov rsi, local_var" and then "sub rsi, B" (the sub I want), and puts the value back to local stack "mov local_var, rsi" and then does the jump. For sure this is a different thing now, though. – kktsuri Jul 25 '14 at 18:47

score 0 · Answer 2 · answered Jul 25 '14 at 15:02

0

How about the following assembly code (you can wrap it into GCC format):

   sub  rcx, rdx  ; assuming operands are in rcx, rdx
   setc al        ; capture carry bit int AL (see Intel "setxx" instructions)
   ; return AL as boolean to compiler

Then you invoke/inline the assembly code, and branch on the resulting boolean.

answered Jul 25 '14 at 15:02

Ira Baxter

93,541
22
172
341

Your solution is good it crossed my mind as well, the only problem is that using inline asm this way makes the compiler not optimize such code (because it can't know), and in this particular case, the underflow detection is done with special code/error handling, which can't be done just by modifying the return value -- unless, I check the return value, and then jump according to it, but in that case it is slower than GCC's "sub/cmp" combo I tried to avoid. Thanks though. – kktsuri Jul 25 '14 at 18:46

score 0 · Answer 3 · answered Jul 25 '14 at 16:13

0

Have you tested whether this is actually faster? Modern x86-microarchitectures use microcode, turning single assembly instructions into sequences of simpler micro-operations. Some of them also do micro-op fusion, in which a sequence of assembly-instructions is turned into a single micro-op. In particular, sequences like test %reg, %reg; jcc target are fused, probably because global processor flags are a bane of performance.
If cmp %reg, %reg; jcc target is mOp-fused, gcc might use that to get faster code. In my experience, gcc is very good at scheduling and similar low-level optimizations.

answered Jul 25 '14 at 16:13

EOF

6,273
2
26
50

I assumed it is faster, because sub is just like cmp except it keeps the result, and I don't think GCC knows something hidden here because, if I do sign check (i.e typecast them to signed, then check if less than 0), it will do just a sub and then a "js" to jump (sign bit = negative), if cmp/sub combination was faster it would've done that instead for sure. Thank you for the insight about micro-ops though, I didn't know about that, I did a brief research now looks very interesting. – kktsuri Jul 25 '14 at 18:44

Subtract and detect underflow, most efficient way? (x86/64 with GCC)

3 Answers3