Dividing with a negative number gives me an overflow in NASM

Question

I'm teaching myself some assembly programming with x86-64 Mac OS. I'm trying to figure out why when it comes to dividing a positive integer with a negative integer gives me an overflow. For example, 5/-2 must return -2. However, in my case, it returns a 2147483371 when I do -554/2 instead of -277... This is what I have in my assembly file:

; compiling using: nasm -f macho64 -o divide.o divide.s
[bits 64]
global _divide
section .text

; int divide(int dividend, int divisor)
_divide:

    xor rdx, rdx        ; making this to 0
    push rbp            ; base stack pointer
    mov rax, rdi        ; dividend
    mov rcx, rsi        ; divisor
    idiv rcx            ; integer division

    add rsp, 8
    ret

In my main.c file, I have this:

#include <stdio.h>
extern int divide(int dividend, int divisor);
int main(void)
{
    printf("divide: %d\n\n", divide(-554,2));
    return (0);
}

OUTPUT: divide: 2147483371

Can someone explain to me what exactly I'm doing wrong?

This is why `cdq` / `cqo` is used to set up for `idiv`, while `xor edx,edx` is used to set up for `div`. Also, why would you `mov rcx,rsi` instead of `idiv rsi`? (Or `idiv esi`, because you told the compiler you were only going to look at the low 32 bits of input registers by making the return type `int`. The compiler doesn't have to zero or sign-extend inputs to 64 bit for you, in either of the major x86-64 calling conventions. [Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?](https://stackoverflow.com/q/36706721)) — Peter Cordes, Aug 07 '18 at 02:01
[When and why do we sign extend and use cdq with mul/div?](https://stackoverflow.com/q/36464879) — Peter Cordes, Aug 07 '18 at 02:19

paxdiablo · Accepted Answer · 2018-08-07T04:58:13.447

9

The 32-bit value -554_signed is equivalent to 4,294,966,742_unsigned and half of that is indeed 2,147,483,371, the answer you're getting. So it looks like a signed/unsigned issue. And, upon examining the x86 docs for idiv, we see:

IDIV r/m64 Signed divide RDX:RAX by r/m64, result stored in:
    RAX <- Quotient,
    RDX <- Remainder.

Note that first line, specifically the "signed divide rdx:rax by" bit. When Intel talk about rdx:rax, they mean the 128-bit value formed from those two 64-bit registers. Assuming that those two 64-bit registers contain the (hex) values:

rax : 01234567 89ABCDEF
rdx : 11112222 FFFFEEEE

then the rdx:rax value will be the 128-bit value:

rdx:rax : 11112222 FFFFEEEE 01234567 89ABCDEF

Now, because you are zeroing out rdx, the combined value is considered positive because the top bit is zero. What you actually need to do is sign-extend rax into rdx:rax, a method that preserves the sign in the extended value. For example, consider the 32-bit -1, sign-extended properly and improperly into a 64-bit value:

         ffffffff     32-bit:                        -1.
ffffffff ffffffff     64-bit proper:                 -1.
00000000 ffffffff     64-bit improper:    4,294,967,295.

To sign-extend properly, the leftmost bits (rdx in your case) should be all one-bits if the rightmost bits (rax for you) form a negative number, all zero-bits otherwise.

Of course, those clever Intel engineers have already thought of just this use case, so you can do it with the cqo convert-quadword-to-octoword instruction, which sign extends correctly. With that in mind, your code for setting eax would become:

    mov   rax, rdi          ; Get dividend and
    cqo                     ;   sign extend to rdx:rax.

However, you may have an extra problem as well. Even though the System V x86-64 ABI specifies that the parameters are passed in the 64-bit registers (rXX), it's possible that passing 32-bit values will actually leave the upper bits containing rubbish (and I think you're allowed to leave rubbish in the upper parts of the return value as well. See this excellent answer for details.

So you should not assume that you will have a sane value in the entire 64-bit register, only in the rightmost 32 bits.

In your case (assuming 32-bit integers), you should be sign extending 32-to-64 rather than 64-to-128, and using a smaller-width division instruction. That would result in something more like:

global _divide
section .text

; int32_t divide(int32_t ediDividend, int32_t esiDivisor)
_divide:
    mov   eax, edi          ; Get 32-bit dividend and
    cdq                     ;   sign extend to 64-bit edx:eax.

    idiv  esi               ; Weave magic here,
                            ;   zeros leftmost rax.

    ret                     ; Return quotient in rax/eax.

This is untested, but should do what you want. I've actually removed the pushing of rbp since I'm pretty certain it's not necessary. It appears not to be corrupted (this function neither changes it, nor calls any other function which could change it), and it appears you never actually restored it properly in your original code anyway.

edited Aug 07 '18 at 04:58

answered Aug 07 '18 at 01:04

paxdiablo

854,327
234
1,573
1,953

Okay, a little bit offtopic with this question, but I have seen the **RDX:RAX** and it feels that I still don't understand what is the meaning behind. Could you help me to understand what is the meaning behind? Trying to learn assembly by my own is harder than I thought than talking with peers about the code haha. – Zeid Tisnes Aug 07 '18 at 01:54
@ZeidTisnes, have added some more detail to the answer to hopefully make that clear. – paxdiablo Aug 07 '18 at 02:09
@pax: You don't need `[bits 64]`. In fact you probably *shouldn't* use that, because it makes it possible to assemble 64-bit machine code into a 32-bit object file by accident, instead of getting an error on `push rdx`. (Which you don't need, BTW. The x86-64 System V calling convention (like Windows) has RDX as a call-clobbered register exactly because you have to use it as a scratch reg for some common instructions.) – Peter Cordes Aug 07 '18 at 02:24
The `[bits 64]` is just telling the compiler that is a 64-bit machine. Yes, it is not needed, but the purpose of it is for readability for others – Zeid Tisnes Aug 07 '18 at 02:27
Also, `int` is 32-bit in the calling convention you're using, so like I commented on the question, it's not safe to assume your inputs are correctly sign-extended to 64-bit. @ZeidTisnes: any decent optimizing compiler should emit this: `mov eax, edi` / `cdq` / `idiv esi` / `ret`. If your code doesn't do what you want / expect, write it in C and see what a compiler does. It can be very useful. [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116). – Peter Cordes Aug 07 '18 at 02:27
@ZeidTisnes: That's what comments are for. Unless you want to assemble the function into a flat binary, `bits 64` is unhelpful and potentially harmful. `nasm -fmacho64` implies 64-bit, and omitting `bits 64` will save you from accidentally using `nasm -fmacho` and getting code that decodes with a bunch of `inc` / `dec` instructions that were supposed to be REX prefixes when linked into a 32-bit executable and called in 32-bit mode. – Peter Cordes Aug 07 '18 at 02:29
@Pax: the non-terrible way to sign-extend into RDX:RAX with cqo is `mov rdx,rax` / `sar rdx,63` to broadcast the sign bit to all positions. That seems clearer than a conditional `dec` to me, but either is valid. (I would have used `cmovs` or `sets` / `dec`, though, but those are probably less clear) – Peter Cordes Aug 07 '18 at 02:33
I notice that in the `divide1_jmp`: is there, but its nowhere being used. Where are you supposed to use it? – Zeid Tisnes Aug 07 '18 at 02:38
@Zeid, it's being used as the target of the `jns` instruction to skip setting `rdx` to `-1` if `rax` is non-negative. But, since that was the naive way you should prefer the `cqo` method. – paxdiablo Aug 07 '18 at 03:51
So, I tried to use it and still no luck with the `divide1_jmp` unfortunately. What I did try was the link that @Peter Cordes shared about https://stackoverflow.com/questions/36464879/when-and-why-do-we-sign-extend-and-use-cdq-with-mul-div. I added `movsx rax, edi` and seems to work fine when I divide a big number by 2, 4, 8, 16, 32, 64.. etc. However, if I change my `2` into `3` or any number that is not `2^x` (x being the power), it will show me that overflow once again. It seems to be closer than I previously had. – Zeid Tisnes Aug 07 '18 at 04:02
@ZeidTisnes, don't use the naive method. In fact, I'll get rid of it since the `cqo` is better. – paxdiablo Aug 07 '18 at 04:04
@ZeidTisnes: did you finish reading my answer? I tried to explain everything about type width and what happens with a signed 32-bit integer in a 64-bit register. Are you using `cqo` *before* `movsx rax, edi`? Or are you still zeroing RDX? Then 64-bit signed division by a power of 2 could actually work, since the compiler will only use the low 32 bits of the result. A power of 2 divisor is nearly the same as an arithmetic right shift, but a non-power-of-2 divisor makes all the lower bits of the result depend on all the high bits, and not in a way that happens to "work" for your bugs. – Peter Cordes Aug 07 '18 at 04:11
1

Okay, I *think* i've fixed all issues raised in the comments, (a) getting rid of naive method (b) handling junk in upper bits (c) removal of bits64 (d) non-preservation od rdx (e) use esi directly. Let me know if there are any more. Cheers. – paxdiablo Aug 07 '18 at 04:28
@PeterCordes I posted my comment right before you answered haha. Reading your explanation helped me to visualize more the concept. I was a little bit upset with my approach that I was aiming. I aimed to `movsx rax, edi` before `cqo`. Heading to your answer to stay on topic. – Zeid Tisnes Aug 07 '18 at 04:36
1

@paxdiablo: nice cleanup. `and rax, 0xffffffff` is totally redundant, though. When 32-bit `idiv` writes `eax` and `edx`, that [implicitly zeros the upper 32 bits of RAX and RDX](https://stackoverflow.com/questions/11177137/why-do-x86-64-instructions-on-32-bit-registers-zero-the-upper-part-of-the-full-6). The most efficient way to explicitly zero-extend EAX into RAX is `mov eax,eax`. (You'll often see `mov ecx,edi` at the start of a function with a `uint32_t` used as an array index.) Also, `and rax, 0xffffffff` is not encodeable: it doesn't fit in a sign-extended imm32. – Peter Cordes Aug 07 '18 at 04:41
Compiler-generated code unfortunately sometimes uses `mov edi,edi` when it could have used a different register; [mov-elimination on Intel CPUs](https://stackoverflow.com/questions/44169342/can-x86s-mov-really-be-free-why-cant-i-reproduce-this-at-all) only works with *different* registers. – Peter Cordes Aug 07 '18 at 04:43
Okay, @Peter, removed the `and` as well - any more changes and my answer's going to be word-for-word a copy of yours :-) Upvoted yours since you appear to be far more au fait with x86-64 than I. – paxdiablo Aug 07 '18 at 05:00
Thanks :) I recently got my x86-64 gold badge; at this point still the only SO user with one. When I was commenting earlier, I hadn't thought through the implications of the `int` vs. `long` mismatch. But once I did, I realized that was just as important as the sign vs. zero extension and worth another answer. I like the table in your answer of 32 bit vs. 64-bit proper/improper. Between our two answers, I think we have it covered. :) – Peter Cordes Aug 07 '18 at 05:19

Peter Cordes · Answer 2 · 2018-08-07T05:22:26.793

Your code is also broken for negative divisors: divide(5,-2) will give zero. This is purely explained by calling-convention. Your zero-extension instead of sign-extension bug (see @paxdiablo's answer) only matters for negative dividends.

You told the compiler your function takes int args, and int is a 32-bit type in the x86-64 System V calling convention.

You're assuming your inputs are sign-extended to 64-bit, but the calling convention doesn't require that, so the compiler won't waste code-size on 10-byte mov r64, imm64 when it can use 5-byte mov r32, imm32 instead.

For more details, see these Q&As. (the 2nd is basically a duplicate of the first):

Thus your compiler will emit code like this for your main:

mov    edi, 5      ; RDI = 0x0000000000000002
mov    esi, -2     ; RSI = 0x00000000FFFFFFFE
call   _divide

I checked on the Godbolt compiler explorer, and that's what gcc and clang really do¹, even for un-optimized code.

For divide(5,-2), your code will result in

RDX=0, RAX=5. i.e. dividend = 0x0000000000000000:0000000000000005, which is correct. (zero- and sign-extension are the same operation for non-negative inputs).
divisor = 0x00000000FFFFFFFE = +4294967294, which is large and positive.

64-bit idiv calculates 5 / 4294967294 producing quotient=RAX=0, remainder=RDX=5.

If you only fixed the type-width / operand-size mismatch bug, you'd still have problems with negative dividends like @paxdiablo's answer explains. But both fixes are necessary for divide(-554,2) to actually work.

So how should you have written it?

You could change the prototype to int64_t or long (which is 64-bit in x86-64 System V), and use cqo to set up for signed division. (When and why do we sign extend and use cdq with mul/div?)

Or you could sign-extend your 32-bit inputs to 64-bit, with movsxd rax, edi / movsxd rcx, esi. But that would be silly. Just use 32-bit operand-size since that's what you told the compiler to pass.

That's good because 64-bit division is much slower than 32-bit division. (https://agner.org/optimize/, and C++ code for testing the Collatz conjecture faster than hand-written assembly - why?).

This is what I'd do:

global _divide
; inputs: int32_t dividend in EDI, int32_t divisor in ESI
; output: int32_t quotient in EAX,  int32_t remainder in EDX
;  (C callers won't be able to access the remainder, unfortunately)
_divide:
    mov     eax, edi
    cdq                    ; sign-extend the dividend into edx:eax

    idiv    esi            ; no need to copy to ecx/rcx first
    ret

No need to push RBP; we're not calling any other functions so realigning the stack doesn't matter, and we're not modifying RBP for use as a frame pointer.

We're allowed to clobber RDX without saving/restoring it: it's a call-clobbered register in x86-64 System V, and Windows x64. (Same as in most 32-bit calling conventions). This makes sense because it's used implicitly by some common instructions like idiv.

That's what gcc and clang do emit (with optimization enabled of course) if you write it in C.

int divide(int dividend, int divisor) {
    return dividend / divisor;
}

(See the Godbolt link above, where I included it with __attribute__((noinline)) so I could still see main actually setting up function args. I could have just named it something else instead.)

As usual, looking at compiler output to see the difference between your code and what the compiler did can clue you in to something you did wrong. (Or give you a better starting point for optimization. In this case the compilers don't have any missed optimizations, though.) See How to remove "noise" from GCC/clang assembly output?.

You can change the types to long (which is 64-bit in x86-64 System V, unlike in Windows x64) if you want to see code-gen for 64-bit integers. And also see how the caller changes, e.g.

    mov     edi, 5
    mov     rsi, -2
    call    _divide

Footnote 1: Interestingly clang -O3's asm output has mov esi, -2, but clang -O0 writes it as mov edi, 4294967294.

Those both assemble to the same instruction, of course, zeroing the upper 32 bits of RDI, because that's how AMD designed AMD64, rather than for example implicitly sign-extending into the full register, which would have been a valid design choice but probably not quite as cheap as zero-extending.

And BTW, Godbolt has compilers targeting Linux, but that's the same calling convention. The only difference is that OS X decorates function names with a leading _ but Linux doesn't.

The Godbolt Compiler is actually good and I just looking at it and see how it works. It looks really interesting to dig through it and eager to learn more. Thanks for your help and @pax. I will come back for more assembly questions in the future — Zeid Tisnes, Aug 07 '18 at 04:48
@ZeidTisnes: Definitely go watch Matt Godbolt's CppCon2017 talk (link in [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116)). Great intro for asm beginners who know C to looking at compiler output, and to x86 asm. See also other links in https://stackoverflow.com/tags/x86/info. But yeah, Matt's compiler-explorer is a fantastic tool; I use it all the time while writing SO answers, for checking if stuff compiles efficiently with different compilers. — Peter Cordes, Aug 07 '18 at 04:51

Dividing with a negative number gives me an overflow in NASM

2 Answers2

So how should you have written it?

Linked