Overcoming the x86 idiv #DE exception

Question

Re: x86 assembly language -

I have three 32-bit signed numbers: n1, n2, and n3.

I want to imul n1 by n2 to get a 64-bit signed result.

I then want to idiv that 64-bit result by n3.

The problem is that if the 64-bit signed result is large enough and/or if n3 is small enough, an overflow will result and idiv will throw a #DE exception.

If idiv simply set #DE on overflow, I could check to confirm that ((n1 * n2) / n3) * n3 + ((n1 * n2) mod n3) = (n1 * n2). If not, overflow would have occurred and I could proceed accordingly.

But #DE does not play nice with others. When it's raised, it reports "Program has stopped working" and then kicks you out.

So i either need to find some way of pre-checking whether an idiv will cause an overflow before I do the division, or I need to do the equivalent of a try ... catch in assembly language.

I've searched the internet (including here) and find very little on this in general; and nothing that is particularly useful.

I've tried inlining the code inside a c++ try ... catch to no avail - it still reports "Program has stopped working" and then kicks you out.

For example, with the two supporting files:

// TC.h
// Version 1.0.0
// MDJ 2016/05/06

extern "C" int q;
extern "C" int m;

and

// TC.s
// Version 1.0.0
// MDJ 2016/05/06

    .globl _q
    .globl _m

    .bss
    .align 4
_q:
   .space 4

_m:
    .space 4

this file runs to completion and produces the correct results:

// TryCatch.cpp
// Version 1.0.0
// MDJ 2016/05/06

#include <iostream>
#include "TC.h"

using namespace std;

int main() {

    cout << endl;

    try {

        # AT&T syntax
        asm(
            "movl       $34,    %eax\n\t"
            "movl       $48,    %edx\n\t"
            "imull  %edx\n\t"
            "movl       $13,    %ecx\n\t"
            "idivl  %ecx\n\t"
            "movl       %eax,   _q\n\t"
            "movl       %edx,   _m\n\t"
        );
    }
    catch(int e) {
        cout << "Caught." << endl;
    }

    cout << "Reached the End." << endl;
    cout << "q = " << q << endl;
    cout << "m = " << m << endl;
    cout << endl;

    return 0;
}

But, if I change n1, n2, and n3 like this:

// TryCatch.cpp
// Version 1.0.0
// MDJ 2016/05/06

#include <iostream>
#include "TC.h"

using namespace std;

int main() {

    cout << endl;

    try {

        # AT&T syntax
        asm(
            "movl       $234567890, %eax\n\t"
            "movl       $123456789, %edx\n\t"
            "imull  %edx\n\t"
            "movl       $3, %ecx\n\t"
            "idivl  %ecx\n\t"
            "movl       %eax,   _q\n\t"
            "movl       %edx,   _m\n\t"
        );
    }
    catch(int e) {
        cout << "Caught." << endl;
    }

    cout << "Reached the End." << endl;
    cout << "q = " << q << endl;
    cout << "m = " << m << endl;
    cout << endl;

    return 0;
}

the "catch" doesn't catch the overflow and the system instead reports "Program has stopped working" and then kicks you out.

Any help would be appreciated.

Using _GCC_ on Windows in a Cygwin? MinGW? MSYS2? or some other Windows environment? — Michael Petch, May 07 '16 at 06:18
You know those `asm` statements aren't safe, right? They clobber registers without telling the compiler about it with operand constraints. If you already know this, a small comment on the asm statement would be good to save future readers from feeling the need to point this out to you. (Or from copying the code and actually using anything like it). — Peter Cordes, May 07 '16 at 06:47
See also another question about [`catch`ing CPU exceptions](http://stackoverflow.com/questions/29736593/c-try-catch-block-doesnt-catch-hardware-exception). Normal C++ exceptions are initiated by some C++ code deciding to `throw`, not by a hardware exception. It won't always Just Work. — Peter Cordes, May 07 '16 at 06:49
On Windows you would have to use the operating system's Structured Exception Handling (SEH) facility, which GCC doesn't support. You could use Microsoft's C/C++ compiler and the `__try` and `__except` keywords to catch the SEH exception. On Linux you would need to use a signal handler. — Ross Ridge, May 07 '16 at 06:53
I assume you've tested and found that gcc won't optimize the `(a*b) / c` as well as you'd like? I assume it emits code to do a full 64b/64b division, rather than 64b / 32b, right? (Because it can't or doesn't even try to prove that a 64b/32b division would be safe). In 64bit code, you can of course just use 128b / 64b => 64b division. That is slower, so if the `#DE` is expected to essentially never happen in normal circumstances, having a 32bit operand-size fast-path with a very slow catch-a-hardware-exception fallback could be a win. — Peter Cordes, May 07 '16 at 06:54
This provides some g++ code that may do what you are attempting to do by implementing some SEH classes and macros: http://www.programmingunlimited.net/siteexec/content.cgi?page=mingw-seh . As well, heed Peter's advice about the dangers of using inline _ASM_ without at *least* listing clobbers using extended assembler templates. — Michael Petch, May 07 '16 at 07:44
If you are on windows, perhaps what you need is [muldiv](https://msdn.microsoft.com/en-us/library/windows/desktop/aa383718%28v=vs.85%29.aspx). — David Wohlferd, May 07 '16 at 07:47
Thank you @MichaelPetch . I'm running MinGW64 under Windows 7 Pro 64-bit. I also intend to eventually port my code to Debian on another Intel/AMD machine, that one 32-bit; and to Raspian on an ARMv8 Raspberry Pi 3. — mdavidjohnson, May 07 '16 at 12:13
Thank you @PeterCordes . Yes, for the benefit of other readers, this inline assembly is just a brief bit of code to illustrate the problem. The actual code I'm developing is in a separate .s file which I link in via the GCC linker. There, I always push any registers I use at the head of the function, and pop them back at the foot of the function. — mdavidjohnson, May 07 '16 at 12:19
Also, thank you all for your additional suggestions and links - I will look into them - this will take awhile. — mdavidjohnson, May 07 '16 at 12:23
Thank you @DavidWohlferd for pointing me to the muldiv function. It's quite interesting; perhaps I'll be able to hack it to get what I need. As written, muldiv produces a rounded quotient. But what I need is an exact symmetric integer quotient and a remainder (or a floored quotient and a true modulus in some cases). If I disassemble (No, number 5, you're safe) the compiled muldiv, perhaps I'll be able to glean some useful ideas. Thanks again. — mdavidjohnson, May 07 '16 at 16:31
@mdavidjohnson: Don't waste instructions pushing / popping `eax`, `ecx`, or `edx`. Function calls are allowed to clobber those registers in all ABIs. Check the calling convention/ABI for the system you're targeting: 64bit calling conventions have more scratch registers. Also, if your function is small, then once you have it working it might be a good idea to get the constraints right and put your asm into GNU C inline asm so it can inline instead of having the overhead of a function call. (See the [x86 tag wiki](http://stackoverflow.com/tags/x86/info) for many useful links). — Peter Cordes, May 07 '16 at 16:32
Thank you @PeterCordes for your insights into pushing / popping and register clobbering. The system I'm developing includes numerous (500+) functions, many of which will be calling others among those functions. So register preservation; even of eax, ecx, and edx; is proving to be both necessary and wise. The system is somewhat akin to a subset of a 1983 Standard Forth, but with subroutine threading instead of indirect threading, and callable from c++ rather than being stand-alone. Of course, this all rather obviates inline code except for little "Please help me with this" examples. — mdavidjohnson, May 07 '16 at 17:38
Continuing @PeterCordes: Also, I want to avoid 128b / 64b => 64b solutions because I want the system to also run on purely 32-bit platforms. I will look more thoroughly into the full 64b / 64b mechanism however. BTW, thanks for the link to the x86 tag wiki - I've bookmarked it for future reference. — mdavidjohnson, May 07 '16 at 17:39
On a side note, regarding x87 rounding, I wrote an answer to someone else using GCC inline assembly to set the rounding bits (applies to other mode bits as well). http://stackoverflow.com/a/35518449/3857942 — Michael Petch, May 07 '16 at 17:40
@RossRidge: I'm committed to g++ for this project since I want to port it to Linux (on both x86 and on ARMv8) as well. Re: gcc not supporting SEH, please see my next comment (to Michael Petch) below. — mdavidjohnson, May 07 '16 at 17:49
@mdavidjohnson: x86 doesn't have a 64b / 64b => 64b div. The [`div`](http://www.felixcloutier.com/x86/DIV.html) insn always has a dividend twice the width of the divisor / result. (It's common to zero or sign-extend `eax` into `edx` when all you actually want is dividend same width as divisor, but that's not the case for you.) You should definitely consider making your code portable to 64bit, so you only need to do things the hard way in 32bit builds. I understand that you want to be portable to 32bit systems, but that shouldn't mean gimping your code when it is running on 64bit. — Peter Cordes, May 07 '16 at 17:56
Also, are you sure you need inline asm? It might turn out that the compiler does a good enough job on its own, esp for 64bit platforms. You may find that it's only worth hand-writing asm versions of things for 32bit x86 and maybe 32bit ARM. — Peter Cordes, May 07 '16 at 17:59
@MichaelPetch: The SEH link you provided may indeed be a viable approach for me. I'm not particularly concerned about overwriting other gcc produced stuff on the stack because, in the event of idiv overflow, all I intend to do is report the nature and location of the error and then abort the process. Re: x87 rounding, one thing I'd like to be able to do is to get the idiv overflow detection working in the x86 and then later accomplish the same thing in the x87 and in SSE/SSE2 so that I can then compare (and later report on) those methods relative speeds. — mdavidjohnson, May 07 '16 at 18:05
Under the hood SEH handling under Windows is completely different whether you're using 32-bit or 64-bit code. (And different again if you're targeting ARM or some other non-x86 CPU.) If you can't use `__try` and `__except`, which hide these implementation details, you need to be clear whether you're creating a 32-bit x86 or 64-bit x64 Windows executables or both. — Ross Ridge, May 07 '16 at 18:12
@PeterCordes: Yes, I agree. Separate 32-bit and 64-bit kernels may indeed be the way to go. I want to get the 32-bit version running first because at least it'll run on both. Then I can start comparing timings. For example I just put together a Sieve of Eratosthenes program in both c++ and in assembly. (Finding all primes up to 16384). On my 64-bit AMD Phenom II X4 2.90 GHz, the assembly version takes about 56% as long to run as does the c++ version. After I'm far enough along in my system, I'll want to build the Sieve in that system for comparison as well. — mdavidjohnson, May 07 '16 at 18:17
@PeterCordes: Yes, indeed. My system concept is an assembly language "kernel" that is callable from, and runs underneath c++. Stuff that's inherently bottlenecked by things other than the processor (e.g. keyboard input, disk access, etc.) will be coded in c++. Much of the stuff I'm writing in assembly is stuff that I know (or at least strongly suspect) will be significantly faster in practical use. — mdavidjohnson, May 07 '16 at 18:24
@RossRidge: Since all I plan to do on idiv overflow is to report and abort (vs. the naked abort I'm currently getting), 32-bit code will probably be sufficient for the SEH - I wouldn't plan to do anything else via SEH. — mdavidjohnson, May 07 '16 at 18:32
I'm not sure what you're trying to say. The 32-bit SEH macros and classes Michael Petch linked won't work in a 64-bit executable whether you resume or not, so as I said you need to make it clear what you're actually building. If you will only ever abort after an exception you can use `AddVectoredExceptionHandler` to install a Unix-signal-like handler. However if you're creating anything other than a 32-bit x86 executable then you may still need to create SEH unwind info for your assembly functions (as required by Microsoft's calling conventions) since "vectored" exceptions are based on SEH. — Ross Ridge, May 07 '16 at 19:02
@DavidWohlferd: I looked into muldiv a little more closely. Unfortunately, in c++, using muldiv just compiles to a bare call in assembly, with no way to trace the code or glean any way to get around the rounding (groan). — mdavidjohnson, May 09 '16 at 04:24
MulDiv is exported from Kernel32.dll. If you are doing native debugging, you should be able to step in. — David Wohlferd, May 09 '16 at 05:22
@zx485: I moved everything into the one question per your request, but that also seems to have deleted several comments as well. What do I need to do to complete this fix? — mdavidjohnson, May 11 '16 at 22:12
@mdavidjohnson: you could undelete your long-division answer. It is a valid partial-answer to the problem of figuring out when we can safely use 64b/32b => 32b division, esp. in the unsigned case. The main question is now completely bloated and unreadable. Try to boil it down to just the parts that are still relevant, and word it in a way that explains the parts that you've now solved and can actually explain as background for the part that's still a question. — Peter Cordes, May 11 '16 at 23:04
re: your `_DivideTester` function: What's that doing in the question? It doesn't seem to add anything. It was basically a separate asm-debugging question that you incorrectly posted as an answer, where it got solved (so it can just stay deleted. There are already a ton of mismatched push/pop questions on SO, so you're not depriving future readers of anything.) Anyway, it's *horribly* written. You can tell you're wasting instructions since you pop the same register multiple times in the epilogue. Even if you insist on doing the debug-print calls in asm (instead of a debugger), it's nasty. — Peter Cordes, May 11 '16 at 23:10
@PeterCordes: I really hate it when people excoriate my code, and then have the unmitigated gall to also be right :-) I guess I'm really going to have to learn how to use gdb - a task which I have loathed and avoided for over (well, nobody needs to know how long). I'll see what I can clean up here over the next couple of days. BTW, I think I'm actually spiraling in towards a workable idiv solution. It involves left-shifting the test dividend one bit. It looks promising. I will continue testing; looking for transition points where it might break down. — mdavidjohnson, May 13 '16 at 02:21
@PeterCordes continued: As a (perhaps inadequate) defense, I WILL say that much of that bloated code will be removed after the solution is discovered. I'm intentionally verbose when adding debugging code - to make sure what we wind up looking at is clearly understood. — mdavidjohnson, May 13 '16 at 02:27
@PeterCordes continued: For example, my most recent test run produced this: Divisor = 13485087 Mult1 = -234567890 Mult2 = 123456789 ResultH = -6742543 ResultL = -1119998778 DividendH = -6742543 DividendL = -1119998778 tDividendH = -1 tDividendL = -6742543 shTestDivH = -1 shTestDivL = -13485086 tQuotient = 0 tRemainder = -13485086 Quotient = -2147483253 Remainder = -4157199 */MOD Pointer: 0x40a73c Stack: 0 0 0 0 0 0 0 0 13485087 -2147483254 9327888 AuxPtr: 0x40a8e4 AuxStk: 0 0 0 0 0 0 0 0 0 0 0 — mdavidjohnson, May 13 '16 at 02:29
Your inline asm is unsafe: you clobber `eax`, `ecx`, and `edx` without telling the compiler about it. Take the constants out of the asm, and just use input/output constraints to ask for the inputs in the right registers, and declare a clobber on `%edx`. (See the x86 tag wiki's [GNU C inline asm link](http://stackoverflow.com/questions/34520013/using-base-pointer-register-in-c-inline-asm/34522750#34522750), specifically this example of [wrapping a single `idiv`](http://stackoverflow.com/questions/3323445/what-is-the-difference-between-asm-and-asm/35959859#35959859)) — Peter Cordes, May 17 '16 at 01:44
At this point, the code is not inline; it is in a separate .s file. It was never my intention to use inline code. I just inlined the first question to simplify its presentation here. Although, for clarity, I haven't shown it here, the separate .s file protects the registers it uses. — mdavidjohnson, May 17 '16 at 16:10
For simplicity, I also have not shown the DivideByZero or DivideOverflow error handlers; nor the temporary storage variables. — mdavidjohnson, May 17 '16 at 16:22

mdavidjohnson · Accepted Answer · 2016-05-16T21:29:45.790

It suddenly occurred to me that I'm completely on the wrong track (and as a model railroader, that's a truly heinous crime ) Pun intended :-).

But, really, I've been going about this the hard way.

Instead, I should take the easy way: I should go back to my 1950's grammar school and my first adventures with long division.

Instead of puzzling over EDX:EAX being divided by ECX, let's think of a two digit (unsigned) number being divided by a one digit (unsigned) number.

Now, the two-digit number is the dividend, and it has a ones digit and a tens digit. So it can vary between 0 and 99.

And, the one-digit number is the divisor, and it has only a ones digit. Thus, it can vary between 1 and 9 (because division by zero is not allowed).

Consider, for example, 77 divided by 2:

So, the result is: the quotient is 38 and the remainder is 1.

But, here, like with the dividend, we're allowing the quotient to also have two digits: a tens digit and a ones digit. What would happen if we instead limit the quotient to having only the ones digit.

Then we could call any division, which results in the quotient having any numeral other than zero in the tens digit field, AN OVERFLOW !!!

But, then, what is the condition required to produce such an overflow: ANY DIVISOR WHICH IS SMALLER THAN OR EQUAL TO THE NUMERAL IN THE TENS DIGIT OF THE DIVIDEND !!!

Analogously, in the division of EDX:EAX by ECX, an overflow will occur if ECX <= EDX !!!

And that is thus our simple test for overflow:

                        ECX <= EDX

That works for unsigned divides.

Pre-checking for signed divide overflow is significantly more complicated. I think this will work, but I'm still testing.

Begin with the 64-bit signed dividend in EDX:EAX and with the 32-bit signed divisor in ECX. Then:

  # Temporarily save the dividend
  movl  %edx, _dividendHigh                     # Most-significant 32b
  movl  %eax, _dividendLow                      # Least-significant 32b

  # Check the divisor for zero
  testl %ecx, %ecx                              # Is divisor = 0 ?
  jz    _DivideByZero                           # Go if Yes

  # Check the divisor for +/- 1
  cmpl  $1, %ecx
  je    _dChkA                                  # Go if divisor =  1
  cmpl  $-1,    %ecx
  je    _dChkA                                  # Go if divisor = -1
  jmp   _dChkC                                  # Else continue

_dChkA:
  # If dividendHigh < -1 or > 0 and divisor = +/- 1
  #   then overflow will occur.
  cmpl  $-1,        %edx
  jl    _DivideOverflow                         # Go if divHigh < -1
  cmpl  $0,     %edx
  jg    _DivideOverflow                         # Go if divHigh >    0

  # If dividendHigh = -1 and bit 31 of dividendLow = 0
  #   and divisor = +/- 1 then overflow will occur.
  cmpl  $-1,    %edx
  jne   _dChkB                                  # Go if divHigh <>  -1
  bt    $31,    %eax
  jnc   _DivideOverflow                         # Go if divLow b31 = 0

_dChkB:
  # If dividendHigh = 0 and bit 31 of dividendLow = 1
  #   and divisor = +/- 1 then overflow will occur.
  cmpl  $0, %edx
  jne   _dChkC                                  # Go if divHigh <>   0
  bt    $31,    %eax
  jc    _DivideOverflow                         # Go if divLow b31 = 1

  # Check for non-unary overflow
  #   Overflow will occur if the 
  #   most-significant 33b can be
  #   divided by the divisor. NOTE:
  #   that's 33 bits, not 32, 
  #   because all numbers are signed.

  # Do dividend shift and manual sign extension
  # Check bit 63 to determine if dividend is positive or negative
_dChkC: 
  bt    $31,    %edx
  jc    _dChkD                                  # Go if negative

  # Do the 64-bit shift                         # Positive
  # First, shift the Least-significant
  #   32b left 1 bit (bit 32 --> CF).
  shll  $1, %eax

  # Then, rotate the Most-significant
  #   32b left, through the carry, 1 bit
  #   (CF --> bit 1 then bit 32 --> CF).
  rcll  $1, %edx

  # Move it to %eax and manually positive-sign extend it
  movl  %edx,   %eax
  jmp       _dChkE

_dChkD:                                             # Negative  
  # Do the 64-bit shift                                     
  # First, shift the Least-significant
  #   32b left 1 bit (bit 32 --> CF).
  shll  $1, %eax

  # Then, rotate the Most-significant
  #   32b left, through the carry, 1 bit
  #   (CF --> bit 1 then bit 32 --> CF).
  rcll  $1, %edx

  # Move it to %eax and manually negative-sign extend it
  movl  %edx,   %eax
  movl  $-1,    %edx

  # Do the Test Divide of the 
  #   Most-Significant 33b
_dChkE:
  idivl %ecx                                    # EDX:EAX / ECX
                                                #   EAX = test quotient
                                                #   EDX = test remainder
  testl %eax,   %eax
  jnz       _DivideOverflow                     # Go if Quotient <> 0

  # Get the full dividend
  movl  _dividendHigh,  %edx                    # Most-significant 32b
  movl  _dividendLow,   %eax                    # Least-significant 32b

  # Perform the 64b by 32b division
  idivl ecx                                     #   EDX:EAX / ECX
                                                #     EAX = quotient
                                                #     EDX = remainder

Coming at it from the other direction, the largest possible quotient is `INT_MAX` or `UINT_MAX` (0xFFFFFFFF). The largest legal dividend is thus`ecx * INT_MAX` or `ecx * UINT_MAX`, plus the maximum remainder. So the smallest dividend that overflows produces a quotient 1 higher than that, with no remainder. So yes, `2^32 * ecx <= edx:eax` causes overflow. That does simplify to `ecx <= edx`. Nice job with the analogy to long division, that's what led me to think of it this way. — Peter Cordes, May 09 '16 at 06:09
I think in the signed case, you can still apply exactly the same rule (but using a signed comparison). The simplification to `ecx <= edx` (ignoring `eax`) doesn't hold for negative numbers, though. `INT_MIN` / `-1` overflows, because `abs(INT_MIN)` isn't representable as a signed integer. Other than that, you might be ok ignoring `eax`, but you should write down the expression `(a_high < b_high) || (a_high == b_high && a_low <= b_low)` and check that that simplifies for the signed case given that `a_low = 0`. An all-zero `eax` gives a higher-magnitude negative number than all-1 (2's comp.) — Peter Cordes, May 09 '16 at 06:23
You might want to look at the implementation of Window's MulDiv given here which shows how to check for signed overflow efficiently: https://blogs.msdn.microsoft.com/oldnewthing/20120514-00/?p=7633 Note that since your function would only detect overflow at most once, the most efficient implementation would be to detect the overflow in a exception or signal handler. — Ross Ridge, May 09 '16 at 06:47
@PeterCordes: (a_high < b_high) || (a_high == b_high && a_low <= b_low) would certainly seem viable for a full 64-bit by 64-bit divide, but both idiv and div are 64-bit by 32-bit. In that case, there is no "a_low" and (a_high < b_high) || (a_high == b_high && a_low <= b_low) degenerates to (a < b_high) || (a == b_high) which, of course, is identical to (a <= b_high). Thus, ECX <= EDX. The simplicity of the long division model would also suggest that the overflow check for the signed divide would consist of the pseudocode: abs(ECX) <= abs(EDX). — mdavidjohnson, May 09 '16 at 16:24
@PeterCordes continued: Even though EDX is the high part of EDX:EAX; considered separately (as it certainly is in the long division model) it is just itself another 32-bit number. As the high part of EDX:EAX = -1, EDX is itself all 1's in binary and abs(EDX) = abs(-1) = 1. In that case, only abs(ECX) = 1 will cause an overflow. — mdavidjohnson, May 09 '16 at 16:30
Thank you @RossRidge for pointing me to that muldiv discussion. It was both interesting and informative. Aside from the muldiv bug, for my purposes the line "if (result < 0) goto overflow;" checks for overflow after the division - at that point idiv or div will have already dumped me out of the program with the "The program has stopped working" message. Conversely, the unsigned ECX <= EDX or the signed abs(ECX) <= abs(EDX) checks would be completed before attempting the division. — mdavidjohnson, May 09 '16 at 17:04
Obviously that's not the case or it wouldn't work. Remember it's a C version of function that's actually written in assembly, so the `UInt32Div16To16` function is actually the DIV instruction. The `if (result < 0) goto overflow;` statement checks whether the unsigned divide "overflowed" into the sign bit. In that case a signed division would have overflowed. Your abs(ECX) <= abs(EDX) check would detect -1 / 1 as an overflow since EDX:EAX would be 0xFFFFFFFFF:0xFFFFFFFFF making EDX = -1 and abs(EDX) = 1 = abs(ECX). — Ross Ridge, May 09 '16 at 17:43
@mdavidjohnson: You're comparing divisor to dividend, but I'm comparing divisor * smallest_overflowing_quotient to the dividend. That's why `ecx = a_high`, and `a_low = 0` in my expression. My point was that coming from this direction seems to require less of a leap in proving that the simplification to just comparing `ecx <= edx` works. Of course it's just another 32bit number. I'm less confident it simplifies as easily for signed division, because the whole `edx:eax` is a 2's compliment number. So the high bit of `eax` has a place-value of `2^32`, rather than being a sign bit. — Peter Cordes, May 09 '16 at 18:00
Sorry, I didn't get the previous comment right. I think I'm beginning to see your point. Please allow me to investigate and test a little more. — mdavidjohnson, May 09 '16 at 18:13
Oops, the high bit of `eax` has a place value of `2^31`, of course. `eax` holds bits [31:0] of the signed 64bit dividend. Also "complement", not "compliment", xD. — Peter Cordes, May 09 '16 at 22:31
@PeterCordes: My testing of abs(ECX) <= abs(EDX) showed that it fails as a vaild check. But, then my testing of (a_high < b_high) || (a_high == b_high && a_low <= b_low) showed it also fails. This just didn't seem right, so I tested some more and finally just tried a very bare test of idiv which pointed out that I must be doing something terribly wrong, which reference to http://x86.renejeschke.de/html/file_module_x86_id_137.html and other references indicate simply sould not be happening. I must be looking right at something and not seeing it. — mdavidjohnson, May 11 '16 at 17:40
@PeterCordes continued: Could you please take a look at my most recent "solution" (which is just an outline of the bare test - used so I could block the code) and see if you can spot my error(s)? — mdavidjohnson, May 11 '16 at 17:42

Peter Cordes · Answer 2 · 2016-05-11T23:28:32.483

Your DivideTester is ridiculous. You only need to preserve the caller's %ebx, %esi, %edi, %ebp, and %esp. You seem to be saving/restoring tons of registers within the same function, and restore the same register multiple times at the end.

Try this:

.globl _DivideTester
_DivideTester:
# extern "C" void DivideTester(void);
# clobbers eax and edx.  The C compiler will assume this, because the standard calling convention says functions are allowed to clobber eax, ecx, and edx

    # mov    $0,       %edx
    # mov    $6742542, %eax
    # Instead, set the globals from C, and print before calling

    mov    _dividendHigh, %edx        # load from globals
    mov    _dividendLow,  %eax
    # movl    _divisor, %ecx

    idivl   _divisor                  # EDX:EAX / divisor
    mov    %eax, _quotient            #       EAX = Quotient
    mov    %edx, _remainder           #       EDX = Remainder

    # print the results from C, or use a debugger to inspect them
    ret

Or if you insist on hard-coding the constants into the asm, you can still do that. You can still print them from C.

Notice how much easier this function is to read? There's basically nothing to go wrong other than the idiv. Getting the function calls correct from asm is a lot more work, so don't waste your time on it. Let the compiler do what it's good at. You can still see exactly what the compiler did by disassembling / single-stepping its code, so it's not like you lose out on debug-ability from leaving that part to C. It's more like you avoid whole classes of bugs (like the one you had at first).

You only need operand-size suffixed for something like mov $1234, _memory, where there's no register to imply the operand-size. I prefer to omit it. If it's ambiguous, as will give you an error message instead of picking a default, so it's always safe.

Unfortunately, this doesn't address the overflow problem. For example, (678152731 * -19) / 7 will bomb as soon as it hits the idivl. — mdavidjohnson, May 17 '16 at 16:30
That's correct; it was posted as an answer when your question included a huge version of this function. It should maybe be a comment, but I really wanted to post code to show you exactly what I was talking about. OTOH, it sort of answered part of what the question was at that point. It's not intended to be anything more than a wrapper around the `idiv` instruction to let you use it from C. — Peter Cordes, May 17 '16 at 16:36
(update: until very recent `as`, only `mov` would error on ambiguity! Other instructions like `add $1, (%rdi)` would pick dword operand-size for some apparent compatibility reason with some ancient Unix thing. Now you at least get a warning.) — Peter Cordes, Apr 11 '21 at 03:38

Overcoming the x86 idiv #DE exception

2 Answers2

Linked