Round floating-point numbers in NASM

Question

I am trying to build a NASM library for a C program. I would like to round a floating-point number given as parameter.

The C function prototype would look like this:

double nearbyint(double x);

I tried using the frndint instruction but I need to push my parameter to the stack first.

Here is what I came up with (does not compile):

bits 64

section .text

global nearbyint

nearbyint:
    push    rbp
    mov     rbp, rsp

    fld     xmm0
    frndint
    fstp    xmm0

    leave
    ret

You can't transfer from xmm to fpu directly. You need to push it to the cpu stack first then load into the fpu stack. Note you don't need fpu you can use `ROUNDSD` if you have SSE4.1. — Jester, Mar 09 '19 at 11:29
`round` is the wrong name for this function. ISO C `round()` uses a special rounding mode that isn't the same as IEEE754 default nearest (with even as a tiebreak). **The function you're trying to implement is called `nearbyint` or `rint` in ISO C / C++**. [round() for float in C++](//stackoverflow.com/a/47347224) — Peter Cordes, Mar 09 '19 at 11:38
To work around the lack of `roundsd` without SSE4.1, you can add then subtract a large number. (glibc's non-SSE4.1 `rint` does that after checking for + or -. Then conditionally doing a fixup on the float bits: https://code.woboq.org/userspace/glibc/sysdeps/ieee754/dbl-64/s_rint.c.html.) `nearbyint` does the same thing, except has to save/restore the FP exception state to avoid raising the Inexact exception. — Peter Cordes, Mar 09 '19 at 11:49
@PeterCordes I need to do my own implementations of mathematical functions without using the system ones. I could still rename as you suggested though. — MrYannKee, Mar 09 '19 at 11:54
Yes, that's what I meant. If you're (re)implementing standard library functions, name them consistently with the standard definitions. Or to put it another way, only use the names of standard library functions if your implementation matches the standard behaviour. — Peter Cordes, Mar 09 '19 at 11:57
@PeterCordes Okay yeah sure. I updated my post. I can't find documentation on how to use `roundsd`. I also don't understand how adding then substracting big numbers would give the closest int to a floating point value. — MrYannKee, Mar 09 '19 at 12:16
There are HTML extracts of Intel's vol.2 ISA ref manual, e.g. https://www.felixcloutier.com/x86/ROUNDSD.html. Without roundsd, adding 2^53 or so to a positive number makes the number so large that the nearest representable `double`s are whole integers. So normal FP rounding rounds to integer. Subtracting again gives you a rounded version of your original number. — Peter Cordes, Mar 09 '19 at 12:33

Peter Cordes · Accepted Answer · 2022-07-30T19:52:54.627

The only way to get data between x87 and XMM is by bouncing it through memory. e.g. movsd [rsp-8] / fld qword [rsp-8] to use the red-zone.

But you don't need to use x87 at all, and shouldn't if you want it to be efficient.

If you have SSE4.1, Use roundsd to round to integer.

rint: roundsd xmm0,xmm0, 0b0100 - current rounding mode (bit 2 = 1), inexact exception flag is set if the result != input (bit 3 = 0).
nearbyint: roundsd xmm0,xmm0, 0b1100 current rounding mode (bit 2 = 1), inexact exception suppressed (bit 3 = 1).
roundsd xmm0,xmm0, 0b1000 : rounding mode override (bit 2 = 0) to _MM_FROUND_TO_NEAREST_INT (bits [1:0] = 00). See roundpd docs for a table. Inexact exception suppressed (bit 3 = 1).

Without SSE4.1 for roundsd, have a look at what glibc's rint does: it adds 2^52 (bit pattern 0x43300000, 0x00000000), making a number so large that the nearest representable doubles are whole integers. So normal FP rounding to the nearest representable value rounds to integer. IEEE binary64 double has 52 explicit mantissa (aka significand) bits, so the size of this number is not a coincidence.

(For negative inputs, it uses -2^52)

Subtracting that again gives you a rounded version of your original number.

The glibc implementation checks for some special cases (like Inf and NaN), and for exponents less than 0 (i.e. inputs with magnitude smaller than 1.0) it copies in the sign bit of the input. So -0.499 through -0.0 will round to -0.0 instead of 0. Otherwise all small inputs would round to +0, since IEEE sub between equal values is required to produce +0.0. Or -0.0 if the current rounding mode is towards -Inf; a sign fixup makes +0.499 round to +0.0 even if the current rounding mode is TowardNegative.

A simple way to implement that with SSE2 would be to isolate the sign bit of the input with pand xmm0, [signbit_mask], then OR in the FP bit pattern of 0x43300000 ..., giving you +- 2^52.

default rel

;; UNTESTED.  IDK if the SIGNBIT_FIXUP does anything other than +-0.0
rint_sse2:
    ;movsd  xmm1, [signbit_mask]  ; can be cheaply constructed on the fly, unlike 2^52
    ;pand   xmm1, xmm0

    pcmpeqd  xmm1, xmm1
    psrlq    xmm1, 1             ; 0x7FFF...
%ifdef SIGNBIT_FIXUP
    movaps   xmm2, xmm1          ; copy the mask
%endif

    andnps   xmm1, xmm0          ; isolate sign bit

%ifdef SIGNBIT_FIXUP
    movaps   xmm3, xmm1          ; save another copy of the sign bit
%endif

    orps     xmm1, [big_double]  ; +-2^52
    addsd    xmm0, xmm1
    subsd    xmm0, xmm1

%ifdef SIGNBIT_FIXUP
    andps    xmm0, xmm2          ; clear the sign bit
    orps     xmm0, xmm3          ; apply the original sign
%endif
    ret

section .rodata
align 16
   big_double: dq 0x4330000000000000   ; 2^52
   ; a 16-byte load will grab whatever happens to be next
   ; if you want a packed rint(__m128d), use   times 2 dq ....

Especially if you omit the SIGNBIT_FIXUP stuff, this is pretty cheap, not much more expensive than roundsd's 2 uops in terms of FP latency. (roundsd has the same latency as addsd + subsd on most CPUs. This is almost certainly not a coincidence, but it does avoid any separate uops for sorting out the sign).

It might be worth noting that the *magic number* (~4.5e15) is exactly 2**52, which derives from the fact that a `double` implemented as a IEEE-754's `binary64` format has 52 fractional significand bits. — njuffa, Mar 09 '19 at 19:15
@njuffa: thanks, I assumed it would be 2^52 or 53, but didn't take the time to find out which when glibc had the constant in decimal. — Peter Cordes, Mar 09 '19 at 20:40
@PeterCordes It is a best practice to express such constants as hexadecimal floating-point numbers (e.g. `0x1.0p+52`), but the code in question may be legacy code. Also, as far as I am aware, ISO-C++ has *still* not added this useful ISO-C99 feature, forcing the continued use of decimal floating-point literal constants even where hexadecimal ones would be clearer. — njuffa, Mar 09 '19 at 21:26

Round floating-point numbers in NASM

1 Answers1