Zero remainder
fdivr st2 ; st0 = st2 / st0
fld ; Duplicate because `fistp` does pop
fistp qword [edi] ; Quotient
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
This code calculates the remainder from:
Remainder = Dividend - (Quotient * Divisor)
Because the Rounding Mode got set to 'Truncate Towards Zero', the fistp qword [edi]
instruction will convert the (copy of the) quotient held in ST0 to an integer right before storing to memory. However the value for the quotient that remains on the (fpu) stack is still a real number with a fraction. Once multiplied with the divisor it will produce the dividend again, resulting in a zero remainder.
What is missing is rounding the quotient to an integer and doing it on the (fpu) stack already:
fdivr st2 ; st0 = st2 / st0
frndint
fld ; Duplicate because `fistp` does pop
fistp qword [edi] ; Quotient
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
But a faster way is to simply reload the integer quotient from memory:
fdivr st2 ; st0 = st2 / st0
fistp qword [edi] ; Quotient
fild qword [edi]
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
Unsigned division
Internally, the FPU dedicates 64 bits to the significand of a number plus a separate bit for the sign of the number. The FPU can represent every whole number in the range from -18'446744073'709551616 to 18'446744073'709551616. The 64-bit significand allows us to work with the unsigned 64-bit integers that range from 0 to 18'446744073'709551615. The only trouble is how to load and store these values that fild
and fistp
can't deal with (because they are restricted to operate in the range from -9'223372036'854775808 to 9'223372036'854775807).
It would be possible to convert back and forth between unsigned quadword and the extended real format so we could use fld
and fstp
instead. Another approach would load/store the unsigned quadwords from/to their upper and lower halves. But conversions take time and so I found that by eliminating enough special cases, the only troublesome operation that remains was the loading of the dividend. Everything else can just use fild
and fistp
as usual.
The special cases include:
- Division by zero: CF=1
- Division by one: Quotient=Dividend and Remainder=0
- Divisions where the Dividend is equal to the Divisor: Quotient=1 and Remainder=0
- Divisions where the Dividend is smaller than the Divisor: Quotient=0 and Remainder=Dividend
Where an actual fdiv
is needed, the code first loads half of the dividend, doubles it back on the (fpu) stack, and conditionally adds 1 if the true dividend was odd.
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
FuDiv: cmp ebx, 1
jbe .TST ; Divisor could be 0 or 1
.a: cmp edx, ecx
jb .LT ; Dividend < Divisor
ja .b ; Dividend > Divisor
cmp eax, ebx
jb .LT ; Dividend < Divisor
je .GE ; Dividend = Divisor
.b: test ecx, ecx
js .GE ; Dividend > Divisor > 7FFFFFFFFFFFFFFFh
shr edx, 1 ; Halving the unsigned 64-bit Dividend
rcr eax, 1 ; (CF) to get in range for `fild`
push edi ecx ebx edx eax
mov edi, esp
fninit
fild qword [edi] ; st0 = int(Dividend / 2)
fadd st0 ; st0 = {Dividend - 1, Dividend}
jnc .c ; (CF)
fld1
faddp ; st0 = Dividend [0, FFFFFFFFFFFFFFFFh]
.c: fild qword [edi+8] ; Divisor is [2, 7FFFFFFFFFFFFFFFh]
fld
fnstcw [edi]
or word [edi], 0C00h ; Truncate Towards Zero
fldcw [edi]
fdivr st2 ; st0 = st2 / st0
fistp qword [edi] ; Quotient
fild qword [edi]
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
pop eax edx ebx ecx edi
ret ; CF=0
.TST: test ecx, ecx
jnz .a
cmp ebx, 1 ; Divisor is 0 or 1
jb .NOK
.ONE: dec ebx ; Remainder ECX:EBX is 0
.NOK: ret
.GE: sub eax, ebx ; Remainder ECX:EBX is Dividend - Divisor
sbb edx, ecx
mov ebx, eax
mov ecx, edx
mov eax, 1 ; Quotient EDX:EAX is 1
cdq
ret ; CF=0
.LT: mov ebx, eax ; Remainder ECX:EBX is Dividend
mov ecx, edx
xor eax, eax ; Quotient EDX:EAX is 0
cdq
ret ; CF=0
; ------------------------------
Faster results
Quoting from an answer that uses a technique named Chunking to divide a couple of 64-bit integers:
Even if you are using a 64-bit data type, practice shows me that the majority of divisions in a (general purpose) program could still do with just using the built-in div
instruction. And that is why I prefixed my code with a detection mechanism that checks whether the divisor in ECX:EBX is less than 4GB (so fitting EBX) and that the dividend's extension in EDX is less than the divisor in EBX. If these conditions are met, the normal div
instruction does the job, and does it faster too. If for some reason (eg. school) using div
is not allowed, then simply remove the prefixed code to be in the clear.
Today's code can benefit from the same prefix, but as it turns out, it is more beneficial to keep detecting the special cases first:
; IN (edx:eax,ecx:ebx) OUT (edx:eax,ecx:ebx,CF)
FuDiv: cmp ebx, 1
jbe .TST ; Divisor could be 0 or 1
.a: cmp edx, ecx
jb .LT ; Dividend < Divisor
ja .b ; Dividend > Divisor
cmp eax, ebx
jb .LT ; Dividend < Divisor
je .GE ; Dividend = Divisor
.b: test ecx, ecx
js .GE ; Dividend > Divisor > 7FFFFFFFFFFFFFFFh
; - - - - - - - - - - - - - - -
jnz .fdiv
cmp edx, ebx
jnb .fdiv
.div: div ebx ; EDX:EAX / EBX --> EAX Quotient, EDX Remainder
mov ebx, edx ; Remainder to ECX:EBX
xor edx, edx ; Quotient to EDX:EAX
ret ; CF=0
; - - - - - - - - - - - - - - -
.fdiv: shr edx, 1 ; Halving the unsigned 64-bit Dividend
rcr eax, 1 ; (CF) to get in range for `fild`
push edi ecx ebx edx eax
mov edi, esp
fninit
fild qword [edi] ; st0 = int(Dividend / 2)
fadd st0 ; st0 = {Dividend - 1, Dividend}
jnc .c ; (CF)
fld1
faddp ; st0 = Dividend [0, FFFFFFFFFFFFFFFFh]
.c: fild qword [edi+8] ; Divisor is [2, 7FFFFFFFFFFFFFFFh]
fld
fnstcw [edi]
or word [edi], 0C00h ; Truncate Towards Zero
fldcw [edi]
fdivr st2 ; st0 = st2 / st0
fistp qword [edi] ; Quotient
fild qword [edi]
fmulp ; st1 *= st0, pop st0
fsubp ; st1 -= st0, pop st0
fistp qword [edi+8] ; Remainder
pop eax edx ebx ecx edi
ret ; CF=0
.TST: test ecx, ecx
jnz .a
cmp ebx, 1 ; Divisor is 0 or 1
jb .NOK
.ONE: dec ebx ; Remainder ECX:EBX is 0
.NOK: ret
.GE: sub eax, ebx ; Remainder ECX:EBX is Dividend - Divisor
sbb edx, ecx
mov ebx, eax
mov ecx, edx
mov eax, 1 ; Quotient EDX:EAX is 1
cdq
ret ; CF=0
.LT: mov ebx, eax ; Remainder ECX:EBX is Dividend
mov ecx, edx
xor eax, eax ; Quotient EDX:EAX is 0
cdq
ret ; CF=0
; ------------------------------