1

SO I was given to understand that fprem1 do ST0 / ST1 and put remainder on ST0 (so replacing it )

but when I did something like this

mov dword ptr [402000],2
mov dword ptr [402004],3
fild dword ptr [402000]
fild dword ptr [402004]
freem1 

I get -1 remainder when I'm expecting it should be 1.

Kanna Kim
  • 383
  • 1
  • 3
  • 15

1 Answers1

3

TL:DR: Use fprem, not fprem1, to get the behaviour you're expecting. Or better, do it with SSE2 instead of messing around with obsolete x87.

fprem implements the fmod() IEEE / ISO C standard function, while fprem1 implements the remainder() standard function.


fprem1 is doing exactly what the instruction reference manual entry for it says it should. (See also the tag wiki for links to Intel's official PDF). Condensed quote:

The remainder represents the following value:

Remainder ← ST(0) − (Q ∗ ST(1))

Here, Q is an integer value that is obtained by rounding the floating-point number quotient of [ST(0) / ST(1)] toward the nearest integer value. The magnitude of the remainder is less than or equal to half the magnitude of the modulus (i.e. ST(1))

The table of results also confirms that of two positive inputs (+F) can give a positive or negative result, or positive zero. (+/-F or +0).

inputs: st0=3  st1=2
3/2 = 1.5  
Round to nearest(1.5): Q = 2.0  
Remainder = 3 - 2 * 2 = -1

You're expecting it to work like the integer modulus operator, where the division result is truncated towards zero, rather than rounded to nearest. That's what fprem does, not fprem1.

fprem (not fprem1):
... The sign of the remainder is the same as the sign of the dividend.


Also note that x87 is obsolete, and in new code it's usually best to use SSE2. e.g.

mov       eax, 2
cvtsi2sd  xmm2, eax
mov       eax, 3
cvtsi2sd  xmm3, eax
 ; or just accept them as function args in registers

; x=2 in xmm2.   y=3 in xmm3
movaps    xmm0, xmm3    ; save a copy of y
divsd     xmm3, xmm2    ; y/x = 3/2 = 1.5
roundsd   xmm1, xmm3, 0 ; SSE4.1  round to nearest integer.
mulsd     xmm1, xmm2    ; Q * divisor
subsd     xmm0, xmm1    ; dividend - (Q * divisor)

; xmm0 = y mod x   (fprem style, not fprem1)
; xmm3 = y/x

I forget what gcc does for nearbyint(x) when SSE4.1 roundsd isn't available, but check on that (with -ffast-math) for an SSE2 fallback. Something like converting to/from integer could work if you know the range is limited.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • hmm now I get it! thanks! but that brings me, even more, question. why those two , fprem and fprem1, have different equation? what is that for? – Kanna Kim Jul 31 '16 at 05:42
  • 1
    Emulating the same code with SSE isn't precise either. If you do it correctly you have to keep subtracting the denominator's mantissa from the remainder's mantissa until the exponent of the remainder is lower or the exponent is the same and the mantissa of the remainder is smaller. This is a lot of code I've shown here: https://stackoverflow.com/questions/69222373/fastest-way-for-wrapping-a-value-in-an-interval/73920254#73920254 The result doesn't have any precision-loss beceause the remainder's mantissa keeps holding 53 bit for the whole iteration. –  Oct 03 '22 at 16:20