How can I implement 64 bit by 64 bit division in x86 assembly?
I have already enabled extended registers with the .386
directive.

- 33,889
- 7
- 43
- 76

- 33
- 1
- 3
-
Will your program run in 32-bit or 64-bit mode? I.e., are x86-64 instructions allowed? – Michael Jun 20 '13 at 17:14
-
I use DosBox....probably I don't understand your question. – Mary140691 Jun 20 '13 at 18:31
-
Can you explain me, please? – Mary140691 Jun 20 '13 at 18:31
-
dosbox is a 16-bit x86 environment. Later architectures are extended to 32 and 64-bit – phuclv Jan 16 '14 at 01:20
-
[Why should EDX be 0 before using the DIV instruction?](https://stackoverflow.com/q/38416593) shows how to do 64-bit / 32-bit division producing a 64-bit quotient, using a chain of two `div` instructions. (The second using the remainder of the 1st as the high half of the dividend, still in EDX.) But this does not trivially generalize to 64-bit / 64-bit; I think it's common to do that with 1-bit-at-a-time shifts and subtracts, instead of long division using multiple `div` instructions. – Peter Cordes May 18 '23 at 09:04
3 Answers
Unsigned division of the 64-bit value in EDX:EAX by the 64-bit value in ECX:EBX
On a 32-bit x86 CPU, the div
instruction can divide the 32-bit value in EDX:EAX by any 32-bit value. We may even divide (successfully) a bigger than 32-bit value in EDX:EAX if we keep the value that is in the dividend's extension (EDX) smaller than the 32-bit divisor.
So, if we want to be able to divide any 64-bit value by any other 64-bit value, we would need to write a code for it. This answer will use a technique named Division by Partial Quotients aka Chunking.
The proposed algorithm will not beat the internal div
instruction
Even if you are using a 64-bit data type, practice shows me that the majority of divisions in a (general purpose) program could still do with just using the built-in div
instruction. And that is why I prefixed my code with a detection mechanism that checks whether the divisor in ECX:EBX is less than 4GB (so fitting EBX) and that the dividend's extension in EDX is less than the divisor in EBX. If these conditions are met, the normal div
instruction does the job, and does it faster too. If for some reason (eg. school) using div
is not allowed, then simply remove the prefixed code to be in the clear.
The algorithm described
The code tries to get rid of the special cases as soon as possible:
- If the divisor is zero, we exit with the carry flag set (i).
- If the dividend is zero, we exit with both quotient and remainder set to zero (ii).
- If the dividend is smaller than the divisor, we exit with a zero quotient and a remainder equal to the dividend (iii).
The .BSR (BitScanReverse) subroutine uses the bsr
instruction to look for the highest bit that is ON in the dividend and in the divisor:
- If the zero flag is returned SET, which can only happen for the dividend since we eliminated a zero divisor earlier (i), we conclude that the dividend is zero and we exit (ii).
- If the number we get for the dividend is smaller than the one for the divisor, we conclude that dividing will be pointless, and we exit (iii).
- If the number we get for the dividend is equal to the one for the divisor, we know that the divisor is aligned with the dividend. No shifting is required.
- If the number we get for the dividend is bigger than the one for the divisor, we start shifting the divisor to the left to get it aligned with the dividend.
The divisor at this point has become the very first value to use in a series of trial subtractions:
- For every subtraction that succeeds we set a matching bit in the quotient that we are trying to construct. In other words: we add a partial quotient. If the current dividend at this point became zero, we are finished else we continue the loop.
- If on the other hand a subtraction fails, we first restore the current dividend before continuing the loop.
- The loop continues with halving the current divisor. Once the current divisor is back at its original value (and was used in a trial subtraction), we shouldn't halve it anymore and consider what is left in the current dividend as the final remainder.
; ------------------------------
; Restoring
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
uDiv: test ecx, ecx
jnz .uDiv
cmp edx, ebx
jnb .uDiv
div ebx ; EDX:EAX / EBX --> EAX Quotient, EDX Remainder
mov ebx, edx ; Remainder to ECX:EBX
xor edx, edx ; Quotient to EDX:EAX
ret ; CF=0
; - - - - - - - - - - - - - - -
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
.uDiv: push esi edi
mov esi, ebx
or esi, ecx
stc
jz .NOK ; (i) Division by 0
xor edi, edi
push edi edi ; (1) 64-bit Quotient under construction
call .BSR ; -> ESI ZF
jz .NIL ; (ii) Dividend is 0
mov edi, esi ; Save position highest set bit in Dividend
xchg ebx, eax ; -> ECX:EBX is Dividend, EDX:EAX is Divisor
xchg ecx, edx
call .BSR ; -> ESI ZF=0 (because we know Divisor <> 0)
sub edi, esi ; Delta between positions of highest set bits
jb .OK ; (iii) Small Dividend / Big Divisor
jz .d ; No shifting required
cmp edi, 32 ; Shift up Divisor so it aligns with Dividend
jb .a
xchg edx, eax ; Before EDX was 0
.a: xchg ecx, edi
shld edx, eax, cl ; CPU uses CL Mod 32
shl eax, cl
xchg ecx, edi
jmp .d
.b: add ebx, eax ; Restore Dividend after failed subtraction
adc ecx, edx
.c: dec edi
js .OK ; We're (back) at the original Divisor
shr edx, 1 ; Next try a smaller Divisor (/2)
rcr eax, 1
.d: sub ebx, eax ; Try subtracting Divisor from Dividend
sbb ecx, edx
jb .b ; Cannot
bts [esp], edi ; -(1)- Add partial quotient
mov esi, ebx ; Is Dividend reduced to zero ?
or esi, ecx
jnz .c ; Not yet
.NIL: xor ebx, ebx ; Zero Remainder
xor ecx, ecx
.OK: pop eax edx ; (1) EDX:EAX is Quotient, ECX:EBX is Remainder
clc
.NOK: pop edi esi
ret
; - - - - - - - - - - - - - - -
; IN (edx:eax) OUT (esi,ZF)
.BSR: xor esi, esi
test edx, edx
jnz @f
bsr esi, eax ; -> ESI=[0,31] ZF
ret
@@: bsr esi, edx ; -> ESI=[0,31] ZF=0
add esi, 32 ; -> ESI=[32,63] ZF=0
ret
; ------------------------------
As an example: 34 / 5
The dividend 34 is 00100010b and the divisor 5 is 00000101b.
It will produce the quotient 00000110b (6) and the remainder 00000100b (4).
The binary representation of course has 56 more zero bits prepended!
Aligned
v
00100010 (34)
- 00101000 (40) Divisor << 3
--------
11111010 (-6) NOK ---------------> Quotient = 00000000
+ 00101000 (40) Restore
--------
00100010 (34)
- 00010100 (20) Divisor << 2
-------- \----------->-----------\
00001110 (14) OK ----------------> Quotient = 00000100
- 00001010 (10) Divisor << 1
-------- \----------->------------\
00000100 ( 4) OK ----------------> Quotient = 00000110
- 00000101 ( 5) Divisor << 0
--------
11111111 (-1) NOK ---------------> Quotient = 00000110 (6)
+ 00000101 ( 5) Restore
--------
00000100 ( 4) Remainder
An optimization
A nice improvement is, to not have to restore the dividend in case of a failed subtraction.
Same example as before: 34 / 5
We now combine the restoring addition plus 40 with the further subtraction minus 20 into the single addition plus 20.
Of course in the end, when no more subtractions follow, we still need that final restore to get the remainder.
Aligned
v
00100010 (34)
- 00101000 (40) Divisor << 3
--------
11111010 (-6) NOK ---------------> Quotient = 00000000
+ 00010100 (20) Divisor << 2
-------- \----------->-----------\
00001110 (14) OK ----------------> Quotient = 00000100
- 00001010 (10) Divisor << 1
-------- \----------->------------\
00000100 ( 4) OK ----------------> Quotient = 00000110
- 00000101 ( 5) Divisor << 0
--------
11111111 (-1) NOK ---------------> Quotient = 00000110 (6)
+ 00000101 ( 5) Restore
--------
00000100 ( 4) Remainder
In code:
; ------------------------------
; Non-restoring
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
uDiv: test ecx, ecx
jnz .uDiv
cmp edx, ebx
jnb .uDiv
div ebx ; EDX:EAX / EBX --> EAX Quotient, EDX Remainder
mov ebx, edx ; Remainder to ECX:EBX
xor edx, edx ; Quotient to EDX:EAX
ret ; CF=0
; - - - - - - - - - - - - - - -
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
.uDiv: push esi edi
mov esi, ebx
or esi, ecx
stc
jz .NOK ; (i) Division by 0
xor edi, edi
push edi edi ; (1) 64-bit Quotient under construction
call .BSR ; -> ESI ZF
jz .NIL ; (ii) Dividend is 0
mov edi, esi ; Save position highest set bit in Dividend
xchg ebx, eax ; -> ECX:EBX is Dividend, EDX:EAX is Divisor
xchg ecx, edx
call .BSR ; -> ESI ZF=0 (because we know Divisor <> 0)
sub edi, esi ; Delta between positions of highest set bits
jb .OK ; (iii) Small Dividend / Big Divisor
jz .d ; No shifting required
cmp edi, 32 ; Shift up Divisor so it aligns with Dividend
jb .a
xchg edx, eax ; Before EDX was 0
.a: xchg ecx, edi
shld edx, eax, cl ; CPU uses CL Mod 32
shl eax, cl
xchg ecx, edi
jmp .d
.OK_: add ebx, eax ; Final restore to obtain Remainder
adc ecx, edx
jmp .OK
ALIGN 16
.b: dec edi
js .OK_ ; We're (back) at the original Divisor
shr edx, 1 ; Next try a smaller Divisor (/2)
rcr eax, 1
add ebx, eax ; This ADD/ADC replaces the restoring
adc ecx, edx ; ADD/ADC followed by a further SUB/SBB
js .b
jmp .e
ALIGN 16
.c: dec edi
js .OK ; We're (back) at the original Divisor
shr edx, 1 ; Next try a smaller Divisor (/2)
rcr eax, 1
.d: sub ebx, eax ; Try subtracting Divisor from Dividend
sbb ecx, edx
jb .b ; Cannot
.e: bts [esp], edi ; -(1)- Add partial quotient
mov esi, ebx ; Is Dividend reduced to zero ?
or esi, ecx
jnz .c ; Not yet
.NIL: xor ebx, ebx ; Zero Remainder
xor ecx, ecx
.OK: pop eax edx ; (1) EDX:EAX is Quotient, ECX:EBX is Remainder
clc
.NOK: pop edi esi
ret
; - - - - - - - - - - - - - - -
; IN (edx:eax) OUT (esi,ZF)
.BSR: xor esi, esi
test edx, edx
jnz @f
bsr esi, eax ; -> ESI=[0,31] ZF
ret
@@: bsr esi, edx ; -> ESI=[0,31] ZF=0
add esi, 32 ; -> ESI=[32,63] ZF=0
ret
; ------------------------------
Signed division of the 64-bit value in EDX:EAX by the 64-bit value in ECX:EBX
The easiest way would be to write a wrapper around the unsigned division that we discussed above:
- If an input is negative we make it positive and we remember doing so.
- Dividing the negative value 8000'0000'0000'0000h by -1 must not be allowed to produce the positive value 8000'0000'0000'0000h which is greater than the max signed integer 7FFF'FFFF'FFFF'FFFFh (
.c:
). - The sign for the quotient is the XOR of signs of the dividend and divisor, so two counts of minus (-) make for plus (+) (
.d:
). - The sign for the remainder is the same as the sign we had on the dividend (
.e:
).
; ------------------------------
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
iDiv: push esi
xor esi, esi
test edx, edx ; Sgn(Num1)
jns .a
neg edx ; Abs(Num1)
neg eax
sbb edx, esi ; ESI=0
xor esi, 3 ; Bit 1 is for Remainder, bit 0 is for Quotient
.a: test ecx, ecx ; Sgn(Num2)
jns .b
neg ecx ; Abs(Num2)
neg ebx
sbb ecx, 0
xor esi, 1 ; Bit 0 is for Quotient
.b: call uDiv ; -> EDX:EAX ECX:EBX CF
jc .NOK ; 'Division by zero'
.c: test edx, edx
jns .d ; It's not 8000'0000'0000'0000h
cmp esi, 11b
jb .NOK ; Signed overflow
.d: shr esi, 1 ; Sign for Quotient
jnc .e
neg edx ; Neg(Quotient)
neg eax
sbb edx, 0
.e: shr esi, 1 ; Sign for Remainder
jnc .OK
neg ecx ; Neg(Remainder)
neg ebx
sbb ecx, esi ; ESI=0
.OK: clc
.NOK: pop esi
ret
; ------------------------------
Check out this conversion algorithm in case you need to display the 64-bit number from EDX:EAX.

- 33,889
- 7
- 43
- 76
-
1BSR has an output dependency: it can't start executing until the left operand is ready, because it leaves the destination unmodified in case the input is 0. (AMD documents this, Intel does it but documents the result as undefined.) BSR is also slow on AMD. I'd suggest `test edx,edx` / `jnz @f` and put the BSR of the high half into the branch, so you only do one BSR into ESI, of either the high or low half. (Core 2 and later macro-fuse test/jnz so it doesn't cost any extra micro-ops, just code-size.) And/or perhaps `xor esi,esi` to break the false dependency before the first BSR. – Peter Cordes Jun 11 '23 at 17:48
Abuse x87
FILD QWORD [SI]
FIDIV QWORD [BX]
FISTP QWORD [DI]
As Peter Cordes mentioned, in default FP rounding mode, it rounds to nearest, tie to even. FISTTP
introduced in SSE3 allows to truncate so it behaves like IDIV
instruction.
It has two rounding steps, but since x87 preserves 64 bits and divisor is at most 263-1, increasing or decreasing dividend gives quotient a change of larger than eps. Thus it behaves like the first step is accurate.

- 1,157
- 5
- 17
-
2Interesting. If the FP rounding mode is still the default, that will round to nearest, unlike `idiv` which truncates towards zero. That might be ok for most use cases. It also has two rounding steps, first to 80-bit (64-bit mantissa, unless precision-control is lowered), then to integer. If you were using SSE3 `fisttp` to convert to integer with truncation after rounding to an x87 float with the default rounding mode, that could perhaps lead to rounding up a result to the next integer. If you change the FP rounding mode to truncation before the fidiv (not just the fistp), it's safe. – Peter Cordes May 18 '23 at 06:48
-
1The included code snippet at best is pseudo-code. The FPU can't use `fidiv` with a 64-bit operand. Only `fidiv m16int` and `fidiv m32int` exist. – Sep Roland Jun 02 '23 at 17:08
x86 knows a 64 by 32 bit division with 32 bit results for remainder and quotient like so
POP EBX ;DIVISOR
POP EDX ;MSW OF DIVIDEND
POP EAX ;LSW OF DIVIDEND
DIV EBX ;64/32 BIT DIVIDE
PUSH EDX ; remainder
PUSH EAX ; quotient
You don't want to divide by a 64 bit number that doesn't fit in 32 bits, because you will loose 10 digits of precision, minimum.
If you persevere though, you can compose a 64/64 divison from this, and then afterwards analyse if you really ment it.
If you know how to divide a 2-digit decimal number by a 2-digit decimal number (long division) you can use the above to accomplish dividing a 2-digit 2^32 base number by a 2-digit 2^32 base number. This is left as an exercise for the reader.

- 854
- 6
- 19
-
Important to mention that 64/32 => 32-bit division raises a `#DE` exception if the quotient doesn't fit in EAX. So it's only safe if EDX < EBX (unsigned). – Peter Cordes May 21 '19 at 15:44
-
@peter, the comment is irrelevant because if you do long division, the next digit for the quotient is always (you guessed it) a one digit number. – Albert van der Horst Jul 27 '19 at 10:54