How to do 64 bit by 64 bit division in assembly?

Question

How can I implement 64 bit by 64 bit division in x86 assembly?
I have already enabled extended registers with the .386 directive.

Will your program run in 32-bit or 64-bit mode? I.e., are x86-64 instructions allowed? — Michael, Jun 20 '13 at 17:14
dosbox is a 16-bit x86 environment. Later architectures are extended to 32 and 64-bit — phuclv, Jan 16 '14 at 01:20
[Why should EDX be 0 before using the DIV instruction?](https://stackoverflow.com/q/38416593) shows how to do 64-bit / 32-bit division producing a 64-bit quotient, using a chain of two `div` instructions. (The second using the remainder of the 1st as the high half of the dividend, still in EDX.) But this does not trivially generalize to 64-bit / 64-bit; I think it's common to do that with 1-bit-at-a-time shifts and subtracts, instead of long division using multiple `div` instructions. — Peter Cordes, May 18 '23 at 09:04

Sep Roland · Answer 1 · 2023-06-11T18:46:18.163

Unsigned division of the 64-bit value in EDX:EAX by the 64-bit value in ECX:EBX

On a 32-bit x86 CPU, the div instruction can divide the 32-bit value in EDX:EAX by any 32-bit value. We may even divide (successfully) a bigger than 32-bit value in EDX:EAX if we keep the value that is in the dividend's extension (EDX) smaller than the 32-bit divisor.
So, if we want to be able to divide any 64-bit value by any other 64-bit value, we would need to write a code for it. This answer will use a technique named Division by Partial Quotients aka Chunking.

The proposed algorithm will not beat the internal `div` instruction

Even if you are using a 64-bit data type, practice shows me that the majority of divisions in a (general purpose) program could still do with just using the built-in div instruction. And that is why I prefixed my code with a detection mechanism that checks whether the divisor in ECX:EBX is less than 4GB (so fitting EBX) and that the dividend's extension in EDX is less than the divisor in EBX. If these conditions are met, the normal div instruction does the job, and does it faster too. If for some reason (eg. school) using div is not allowed, then simply remove the prefixed code to be in the clear.

The algorithm described

The code tries to get rid of the special cases as soon as possible:

If the divisor is zero, we exit with the carry flag set (i).
If the dividend is zero, we exit with both quotient and remainder set to zero (ii).
If the dividend is smaller than the divisor, we exit with a zero quotient and a remainder equal to the dividend (iii).

The .BSR (BitScanReverse) subroutine uses the bsr instruction to look for the highest bit that is ON in the dividend and in the divisor:

If the zero flag is returned SET, which can only happen for the dividend since we eliminated a zero divisor earlier (i), we conclude that the dividend is zero and we exit (ii).
If the number we get for the dividend is smaller than the one for the divisor, we conclude that dividing will be pointless, and we exit (iii).
If the number we get for the dividend is equal to the one for the divisor, we know that the divisor is aligned with the dividend. No shifting is required.
If the number we get for the dividend is bigger than the one for the divisor, we start shifting the divisor to the left to get it aligned with the dividend.

The divisor at this point has become the very first value to use in a series of trial subtractions:

For every subtraction that succeeds we set a matching bit in the quotient that we are trying to construct. In other words: we add a partial quotient. If the current dividend at this point became zero, we are finished else we continue the loop.
If on the other hand a subtraction fails, we first restore the current dividend before continuing the loop.
The loop continues with halving the current divisor. Once the current divisor is back at its original value (and was used in a trial subtraction), we shouldn't halve it anymore and consider what is left in the current dividend as the final remainder.

; ------------------------------
; Restoring
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
uDiv:   test    ecx, ecx
        jnz     .uDiv
        cmp     edx, ebx
        jnb     .uDiv
        div     ebx             ; EDX:EAX / EBX --> EAX Quotient, EDX Remainder
        mov     ebx, edx        ; Remainder to ECX:EBX
        xor     edx, edx        ; Quotient to EDX:EAX
        ret                     ; CF=0
; - - - - - - - - - - - - - - -
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
.uDiv:  push    esi edi
        mov     esi, ebx
        or      esi, ecx
        stc
        jz      .NOK            ; (i) Division by 0

        xor     edi, edi
        push    edi edi         ; (1) 64-bit Quotient under construction
        call    .BSR            ; -> ESI ZF
        jz      .NIL            ; (ii) Dividend is 0
        mov     edi, esi        ; Save position highest set bit in Dividend
        xchg    ebx, eax        ; -> ECX:EBX is Dividend, EDX:EAX is Divisor
        xchg    ecx, edx
        call    .BSR            ; -> ESI ZF=0 (because we know Divisor <> 0)
        sub     edi, esi        ; Delta between positions of highest set bits
        jb      .OK             ; (iii) Small Dividend / Big Divisor
        jz      .d              ; No shifting required
        cmp     edi, 32         ; Shift up Divisor so it aligns with Dividend
        jb      .a
        xchg    edx, eax        ; Before EDX was 0
.a:     xchg    ecx, edi
        shld    edx, eax, cl    ; CPU uses CL Mod 32
        shl     eax, cl
        xchg    ecx, edi

        jmp     .d
.b:     add     ebx, eax        ; Restore Dividend after failed subtraction
        adc     ecx, edx
.c:     dec     edi
        js      .OK             ; We're (back) at the original Divisor
        shr     edx, 1          ; Next try a smaller Divisor (/2)
        rcr     eax, 1
.d:     sub     ebx, eax        ; Try subtracting Divisor from Dividend
        sbb     ecx, edx
        jb      .b              ; Cannot
        bts     [esp], edi      ; -(1)- Add partial quotient
        mov     esi, ebx        ; Is Dividend reduced to zero ?
        or      esi, ecx
        jnz     .c              ; Not yet

.NIL:   xor     ebx, ebx        ; Zero Remainder
        xor     ecx, ecx
.OK:    pop     eax edx         ; (1) EDX:EAX is Quotient, ECX:EBX is Remainder
        clc
.NOK:   pop     edi esi
        ret
; - - - - - - - - - - - - - - -
; IN (edx:eax) OUT (esi,ZF)
.BSR:   xor     esi, esi
        test    edx, edx
        jnz     @f
        bsr     esi, eax        ; -> ESI=[0,31] ZF
        ret
@@:     bsr     esi, edx        ; -> ESI=[0,31] ZF=0
        add     esi, 32         ; -> ESI=[32,63] ZF=0
        ret
; ------------------------------

As an example: 34 / 5

The dividend 34 is 00100010b and the divisor 5 is 00000101b.
It will produce the quotient 00000110b (6) and the remainder 00000100b (4).
The binary representation of course has 56 more zero bits prepended!

    Aligned
    v
  00100010 (34)
- 00101000 (40) Divisor << 3
  --------
  11111010 (-6) NOK ---------------> Quotient = 00000000
+ 00101000 (40) Restore
  --------
  00100010 (34)
- 00010100 (20) Divisor << 2
  --------                  \----------->-----------\
  00001110 (14) OK ----------------> Quotient = 00000100
- 00001010 (10) Divisor << 1
  --------                  \----------->------------\
  00000100 ( 4) OK ----------------> Quotient = 00000110
- 00000101 ( 5) Divisor << 0
  --------
  11111111 (-1) NOK ---------------> Quotient = 00000110 (6)
+ 00000101 ( 5) Restore
  --------
  00000100 ( 4) Remainder

An optimization

A nice improvement is, to not have to restore the dividend in case of a failed subtraction.
Same example as before: 34 / 5
We now combine the restoring addition plus 40 with the further subtraction minus 20 into the single addition plus 20.
Of course in the end, when no more subtractions follow, we still need that final restore to get the remainder.

    Aligned
    v
  00100010 (34)
- 00101000 (40) Divisor << 3
  --------
  11111010 (-6) NOK ---------------> Quotient = 00000000
+ 00010100 (20) Divisor << 2
  --------                  \----------->-----------\
  00001110 (14) OK ----------------> Quotient = 00000100
- 00001010 (10) Divisor << 1
  --------                  \----------->------------\
  00000100 ( 4) OK ----------------> Quotient = 00000110
- 00000101 ( 5) Divisor << 0
  --------
  11111111 (-1) NOK ---------------> Quotient = 00000110 (6)
+ 00000101 ( 5) Restore
  --------
  00000100 ( 4) Remainder

In code:

; ------------------------------
; Non-restoring
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
uDiv:   test    ecx, ecx
        jnz     .uDiv
        cmp     edx, ebx
        jnb     .uDiv
        div     ebx             ; EDX:EAX / EBX --> EAX Quotient, EDX Remainder
        mov     ebx, edx        ; Remainder to ECX:EBX
        xor     edx, edx        ; Quotient to EDX:EAX
        ret                     ; CF=0
; - - - - - - - - - - - - - - -
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
.uDiv:  push    esi edi
        mov     esi, ebx
        or      esi, ecx
        stc
        jz      .NOK            ; (i) Division by 0

        xor     edi, edi
        push    edi edi         ; (1) 64-bit Quotient under construction
        call    .BSR            ; -> ESI ZF
        jz      .NIL            ; (ii) Dividend is 0
        mov     edi, esi        ; Save position highest set bit in Dividend
        xchg    ebx, eax        ; -> ECX:EBX is Dividend, EDX:EAX is Divisor
        xchg    ecx, edx
        call    .BSR            ; -> ESI ZF=0 (because we know Divisor <> 0)
        sub     edi, esi        ; Delta between positions of highest set bits
        jb      .OK             ; (iii) Small Dividend / Big Divisor
        jz      .d              ; No shifting required
        cmp     edi, 32         ; Shift up Divisor so it aligns with Dividend
        jb      .a
        xchg    edx, eax        ; Before EDX was 0
.a:     xchg    ecx, edi
        shld    edx, eax, cl    ; CPU uses CL Mod 32
        shl     eax, cl
        xchg    ecx, edi

        jmp     .d
.OK_:   add     ebx, eax        ; Final restore to obtain Remainder
        adc     ecx, edx
        jmp     .OK

        ALIGN   16
.b:     dec     edi
        js      .OK_            ; We're (back) at the original Divisor
        shr     edx, 1          ; Next try a smaller Divisor (/2)
        rcr     eax, 1
        add     ebx, eax        ; This ADD/ADC replaces the restoring
        adc     ecx, edx        ;  ADD/ADC followed by a further SUB/SBB
        js      .b
        jmp     .e

        ALIGN   16
.c:     dec     edi
        js      .OK             ; We're (back) at the original Divisor
        shr     edx, 1          ; Next try a smaller Divisor (/2)
        rcr     eax, 1
.d:     sub     ebx, eax        ; Try subtracting Divisor from Dividend
        sbb     ecx, edx
        jb      .b              ; Cannot
.e:     bts     [esp], edi      ; -(1)- Add partial quotient
        mov     esi, ebx        ; Is Dividend reduced to zero ?
        or      esi, ecx
        jnz     .c              ; Not yet

.NIL:   xor     ebx, ebx        ; Zero Remainder
        xor     ecx, ecx
.OK:    pop     eax edx         ; (1) EDX:EAX is Quotient, ECX:EBX is Remainder
        clc
.NOK:   pop     edi esi
        ret
; - - - - - - - - - - - - - - -
; IN (edx:eax) OUT (esi,ZF)
.BSR:   xor     esi, esi
        test    edx, edx
        jnz     @f
        bsr     esi, eax        ; -> ESI=[0,31] ZF
        ret
@@:     bsr     esi, edx        ; -> ESI=[0,31] ZF=0
        add     esi, 32         ; -> ESI=[32,63] ZF=0
        ret
; ------------------------------

Signed division of the 64-bit value in EDX:EAX by the 64-bit value in ECX:EBX

The easiest way would be to write a wrapper around the unsigned division that we discussed above:

If an input is negative we make it positive and we remember doing so.
Dividing the negative value 8000'0000'0000'0000h by -1 must not be allowed to produce the positive value 8000'0000'0000'0000h which is greater than the max signed integer 7FFF'FFFF'FFFF'FFFFh (.c:).
The sign for the quotient is the XOR of signs of the dividend and divisor, so two counts of minus (-) make for plus (+) (.d:).
The sign for the remainder is the same as the sign we had on the dividend (.e:).

; ------------------------------
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
iDiv:   push    esi
        xor     esi, esi
        test    edx, edx        ; Sgn(Num1)
        jns     .a
        neg     edx             ; Abs(Num1)
        neg     eax
        sbb     edx, esi        ; ESI=0
        xor     esi, 3          ; Bit 1 is for Remainder, bit 0 is for Quotient
.a:     test    ecx, ecx        ; Sgn(Num2)
        jns     .b
        neg     ecx             ; Abs(Num2)
        neg     ebx
        sbb     ecx, 0
        xor     esi, 1          ; Bit 0 is for Quotient
.b:     call    uDiv            ; -> EDX:EAX ECX:EBX CF
        jc      .NOK            ; 'Division by zero'
.c:     test    edx, edx
        jns     .d              ; It's not 8000'0000'0000'0000h
        cmp     esi, 11b
        jb      .NOK            ; Signed overflow
.d:     shr     esi, 1          ; Sign for Quotient
        jnc     .e
        neg     edx             ; Neg(Quotient)
        neg     eax
        sbb     edx, 0
.e:     shr     esi, 1          ; Sign for Remainder
        jnc     .OK
        neg     ecx             ; Neg(Remainder)
        neg     ebx
        sbb     ecx, esi        ; ESI=0

.OK:    clc
.NOK:   pop     esi
        ret
; ------------------------------

Check out this conversion algorithm in case you need to display the 64-bit number from EDX:EAX.

BSR has an output dependency: it can't start executing until the left operand is ready, because it leaves the destination unmodified in case the input is 0. (AMD documents this, Intel does it but documents the result as undefined.) BSR is also slow on AMD. I'd suggest `test edx,edx` / `jnz @f` and put the BSR of the high half into the branch, so you only do one BSR into ESI, of either the high or low half. (Core 2 and later macro-fuse test/jnz so it doesn't cost any extra micro-ops, just code-size.) And/or perhaps `xor esi,esi` to break the false dependency before the first BSR. — Peter Cordes, Jun 11 '23 at 17:48

l4m2 · Answer 2 · 2023-05-18T08:46:36.757

1

Abuse x87

FILD QWORD [SI]
FIDIV QWORD [BX]
FISTP QWORD [DI]

As Peter Cordes mentioned, in default FP rounding mode, it rounds to nearest, tie to even. FISTTP introduced in SSE3 allows to truncate so it behaves like IDIV instruction.

It has two rounding steps, but since x87 preserves 64 bits and divisor is at most 2⁶³-1, increasing or decreasing dividend gives quotient a change of larger than eps. Thus it behaves like the first step is accurate.

edited May 18 '23 at 08:46

answered May 18 '23 at 06:22

l4m2

1,157
5
17

2

Interesting. If the FP rounding mode is still the default, that will round to nearest, unlike `idiv` which truncates towards zero. That might be ok for most use cases. It also has two rounding steps, first to 80-bit (64-bit mantissa, unless precision-control is lowered), then to integer. If you were using SSE3 `fisttp` to convert to integer with truncation after rounding to an x87 float with the default rounding mode, that could perhaps lead to rounding up a result to the next integer. If you change the FP rounding mode to truncation before the fidiv (not just the fistp), it's safe. – Peter Cordes May 18 '23 at 06:48
1

The included code snippet at best is pseudo-code. The FPU can't use `fidiv` with a 64-bit operand. Only `fidiv m16int` and `fidiv m32int` exist. – Sep Roland Jun 02 '23 at 17:08

score -1 · Answer 3 · answered May 21 '19 at 12:56

x86 knows a 64 by 32 bit division with 32 bit results for remainder and quotient like so

POP     EBX      ;DIVISOR
POP     EDX      ;MSW OF DIVIDEND
POP     EAX      ;LSW OF DIVIDEND
DIV     EBX      ;64/32 BIT DIVIDE
PUSH    EDX      ; remainder
PUSH    EAX      ; quotient

You don't want to divide by a 64 bit number that doesn't fit in 32 bits, because you will loose 10 digits of precision, minimum.

If you persevere though, you can compose a 64/64 divison from this, and then afterwards analyse if you really ment it.

If you know how to divide a 2-digit decimal number by a 2-digit decimal number (long division) you can use the above to accomplish dividing a 2-digit 2^32 base number by a 2-digit 2^32 base number. This is left as an exercise for the reader.

Important to mention that 64/32 => 32-bit division raises a `#DE` exception if the quotient doesn't fit in EAX. So it's only safe if EDX < EBX (unsigned). — Peter Cordes, May 21 '19 at 15:44
@peter, the comment is irrelevant because if you do long division, the next digit for the quotient is always (you guessed it) a one digit number. — Albert van der Horst, Jul 27 '19 at 10:54

How to do 64 bit by 64 bit division in assembly?

3 Answers3

Unsigned division of the 64-bit value in EDX:EAX by the 64-bit value in ECX:EBX

The proposed algorithm will not beat the internal `div` instruction

The algorithm described

As an example: 34 / 5

An optimization

Signed division of the 64-bit value in EDX:EAX by the 64-bit value in ECX:EBX

Linked

How to do 64 bit by 64 bit division in assembly?

3 Answers3

Unsigned division of the 64-bit value in EDX:EAX by the 64-bit value in ECX:EBX

The proposed algorithm will not beat the internal div instruction

The algorithm described

As an example: 34 / 5

An optimization

Signed division of the 64-bit value in EDX:EAX by the 64-bit value in ECX:EBX

Linked

The proposed algorithm will not beat the internal `div` instruction