0

In a separate C program, I have passed 4 parameters to an x86 ASM program.

  1. dividend
  2. divisor
  3. Quotient pointer
  4. Remainder pointer

dividend = 0xA
divisor = 0x3

Which is 10/3.

The quotient should be 3 and the remainder should be 1.

However, my quotient is returning c2 and my remainder is returning 7ffff396f687. Both of which are extremely far off of what i should be getting. I've tried debugging my ASM code and I can't figure out what the problem is.

This is what I have so far. I'm a beginner at this.

global divide64u
divide64u:
push rbp
mov rbp, rsp
mov rdx, rdi
mov rax, rsi
xor rdx, rdx 
div r10
divide64uDone:
pop rbp
ret
halfer
  • 19,824
  • 17
  • 99
  • 186
Boar
  • 9
  • 1
    Why are you dividing by `r10`? I am not aware of any calling convention where the third argument is in `r10`. – fuz May 01 '22 at 21:06
  • Are you trying to do 128-bit / 64-bit division (because you know that the quotient will fit in a uint64_t but the compiler doesn't)? Is that why you're using asm in the first place, instead of just looking at compiler output for `uint64_t` division? If you're not using a 128-bit dividend, you should be zeroing RDX, not copying an arg into it. – Peter Cordes May 02 '22 at 04:40
  • I've just figured it out!!! Thanks, to all the kind people willing to help me out! – Boar May 02 '22 at 17:55
  • OS tag please. It matters. – Joshua May 08 '22 at 20:43

1 Answers1

0

For x86_64, the args are all passed in registers, per the ABI.

So, no need to push/pop rbp/rsp

You can actually code this in C and the compiler optimizer will generate the most efficient code:

typedef unsigned long long u64;

void
davdiv(u64 div,u64 dvr,u64 *quot,u64 *rem)
{

    __asm__ (
        "\tdiv %[dvr]\n"
    :   [quot] "=a" (*quot),
        [rem] "=d" (*rem)
    :   [div] "a" (div),
        [dvr] "r" (dvr),
        "d" (0));
}

void
mydiv(u64 div,u64 dvr,u64 *quot,u64 *rem)
{

    __asm__ __volatile__(
        "\txor  %%edx,%%edx\n"
        "\tmov  %[div],%%rax\n"
        "\tdiv  %[dvr]\n"
        "\tmov  %%rax,%[quot]\n"
        "\tmov  %%rdx,%[rem]\n"
    :   [quot] "=m" (*quot),
        [rem] "=m" (*rem)
    :   [div] "r" (div),
        [dvr] "r" (dvr)
    :   "rax", "rdx");
}

void
cpldiv(u64 div,u64 dvr,u64 *quot,u64 *rem)
{

    *quot = div / dvr;
    *rem = div % dvr;
}

u64
cplretA(u64 div,u64 dvr,u64 *rem)
{
    u64 quot;

    quot = div / dvr;
    *rem = div % dvr;

    return quot;
}

u64
cplretB(u64 div,u64 dvr,u64 *quot)
{
    u64 rem;

    *quot = div / dvr;
    rem = div % dvr;

    return rem;
}

Here is the disassembly of the above compiled with -O2:


div2.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <davdiv>:
   0:   49 89 d0                mov    %rdx,%r8
   3:   48 89 f8                mov    %rdi,%rax
   6:   31 d2                   xor    %edx,%edx
   8:   48 f7 f6                div    %rsi
   b:   49 89 00                mov    %rax,(%r8)
   e:   48 89 11                mov    %rdx,(%rcx)
  11:   c3                      retq
  12:   66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
  19:   00 00 00 00
  1d:   0f 1f 00                nopl   (%rax)

0000000000000020 <mydiv>:
  20:   49 89 d0                mov    %rdx,%r8
  23:   31 d2                   xor    %edx,%edx
  25:   48 89 f8                mov    %rdi,%rax
  28:   48 f7 f6                div    %rsi
  2b:   49 89 00                mov    %rax,(%r8)
  2e:   48 89 11                mov    %rdx,(%rcx)
  31:   c3                      retq
  32:   66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
  39:   00 00 00 00
  3d:   0f 1f 00                nopl   (%rax)

0000000000000040 <cpldiv>:
  40:   48 89 f8                mov    %rdi,%rax
  43:   48 89 d7                mov    %rdx,%rdi
  46:   31 d2                   xor    %edx,%edx
  48:   48 f7 f6                div    %rsi
  4b:   48 89 07                mov    %rax,(%rdi)
  4e:   48 89 11                mov    %rdx,(%rcx)
  51:   c3                      retq
  52:   66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
  59:   00 00 00 00
  5d:   0f 1f 00                nopl   (%rax)

0000000000000060 <cplretA>:
  60:   48 89 d1                mov    %rdx,%rcx
  63:   48 89 f8                mov    %rdi,%rax
  66:   31 d2                   xor    %edx,%edx
  68:   48 f7 f6                div    %rsi
  6b:   48 89 11                mov    %rdx,(%rcx)
  6e:   c3                      retq
  6f:   90                      nop

0000000000000070 <cplretB>:
  70:   48 89 d1                mov    %rdx,%rcx
  73:   48 89 f8                mov    %rdi,%rax
  76:   31 d2                   xor    %edx,%edx
  78:   48 f7 f6                div    %rsi
  7b:   48 89 01                mov    %rax,(%rcx)
  7e:   48 89 d0                mov    %rdx,%rax
  81:   c3                      retq
Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • 2
    How about: `__asm__ ("\tdiv %[dvr]\n" : [quot] "=a" (*quot), [rem] "=d" (*rem) : [div] "a" (div), [dvr] "r" (dvr), "d" (0));`? Let the compiler handle moving everything around for you. Also, isn't there some issue with overflow? – David Wohlferd May 02 '22 at 01:00
  • @DavidWohlferd Nice. I've added that to the example code. But, I'm not sure about overflow. The asm code is the same even for `cpldiv` which is 100% C. The fact that compiler can combine `/` and `%` into a single operation is a common optimization. I've never seen code that checks any flags here (e.g. OF, CF, etc.) – Craig Estey May 02 '22 at 19:55
  • Consider what happens if you take a 128 bit number and divide it by 2. The result is a 127 bit number and it just doesn't fit in a 64 bit register. I believe the processor faults (rather like using a bad pointer). – David Wohlferd May 02 '22 at 20:30
  • @DavidWohlferd No, the processor doesn't fault--it just truncates the result. But we can _never_ have a 128 bit number. Only 64 (because of `u64 div`). That's why we have `xor %edx,%edx`. We're starting with 64 bit numbers. That is, `div` (dividend) is _zero_ extended to 128 bits before the `div` inst. – Craig Estey May 02 '22 at 20:38
  • Hmm. The [docs](https://www.felixcloutier.com/x86/div) are telling me: *Overflow is indicated with the #DE (divide error) exception rather than with the CF flag.* – David Wohlferd May 02 '22 at 22:10
  • @DavidWohlferd is correct; x86 does indeed raise `#DE` if the quotient doesn't fit in the operand-size (AL/AX/EAX/RAX). This is impossible for `div r/m64` *if* you use it with RDX=0, except for the special case of division by zero. But no, `div` itself doesn't zero-extend, you need to manually zero the high half of the dividend if you don't want to take advantage of the full power of 128-bit / 64-bit => 64-bit div, e.g. for N-chunk / 1-chunk extended precision, as explained in [Why should EDX be 0 before using the DIV instruction?](https://stackoverflow.com/q/38416593) – Peter Cordes May 03 '22 at 03:33
  • Duplicates of that are frequent, with people naively using `div` without zeroing [er]DX or AH first. Or `idiv` without `cdq` or `cqo` to sign-extend RAX into RDX:RAX. `INT_MIN / -1` can actually overflow `idiv`'s quotient, though, even with the dividend being only single-width sign-extended: that kind of thing is why it's UB in C. See [Why does integer division by -1 (negative one) result in FPE?](https://stackoverflow.com/q/46378104) for details of the puzzle pieces involved on x86 vs. not faulting on ARM. – Peter Cordes May 03 '22 at 03:39