TL:DR: Use fprem
, not fprem1
, to get the behaviour you're expecting. Or better, do it with SSE2 instead of messing around with obsolete x87.
fprem
implements the fmod()
IEEE / ISO C standard function, while fprem1
implements the remainder()
standard function.
fprem1
is doing exactly what the instruction reference manual entry for it says it should. (See also the x86 tag wiki for links to Intel's official PDF). Condensed quote:
The remainder represents the following value:
Remainder ← ST(0) − (Q ∗ ST(1))
Here, Q is an integer value that is obtained by rounding the floating-point number quotient of [ST(0) / ST(1)] toward the nearest integer value. The magnitude of the remainder is less than or equal to half the magnitude of the modulus (i.e. ST(1))
The table of results also confirms that of two positive inputs (+F) can give a positive or negative result, or positive zero. (+/-F or +0).
inputs: st0=3 st1=2
3/2 = 1.5
Round to nearest(1.5): Q = 2.0
Remainder = 3 - 2 * 2 = -1
You're expecting it to work like the integer modulus operator, where the division result is truncated towards zero, rather than rounded to nearest. That's what fprem
does, not fprem1
.
fprem (not fprem1):
... The sign of the remainder is the same as the sign of the dividend.
Also note that x87 is obsolete, and in new code it's usually best to use SSE2. e.g.
mov eax, 2
cvtsi2sd xmm2, eax
mov eax, 3
cvtsi2sd xmm3, eax
; or just accept them as function args in registers
; x=2 in xmm2. y=3 in xmm3
movaps xmm0, xmm3 ; save a copy of y
divsd xmm3, xmm2 ; y/x = 3/2 = 1.5
roundsd xmm1, xmm3, 0 ; SSE4.1 round to nearest integer.
mulsd xmm1, xmm2 ; Q * divisor
subsd xmm0, xmm1 ; dividend - (Q * divisor)
; xmm0 = y mod x (fprem style, not fprem1)
; xmm3 = y/x
I forget what gcc does for nearbyint(x)
when SSE4.1 roundsd
isn't available, but check on that (with -ffast-math
) for an SSE2 fallback. Something like converting to/from integer could work if you know the range is limited.