In the following, I am assuming double
computation, with double
mapped to the IEEE-754 binary64
format.
Any general division produces two pieces of information: the quotient and the remainder. fmod (a, b)
delivers the remainder of a / b
when the implicit quotient is converted to integer using rounding towards zero (truncation). The original dividend can be computed from the divisor b
, the truncated quotient, and the remainder returned by fmod()
, provided the quotient is exactly representable as a double
. Without the quotient, there is not enough information to re-compute the dividend.
Since a double
operand provides 52 stored significant (mantissa) bits, the quotient is in general only exactly representable if the difference of the binary exponents of the dividend and the divisor does not exceed 52. Obviously we need to avoid dividends and divisors that are infinity or NaN, and the divisor cannot be zero. Recomputing the dividend as (quotient * divisor + remainder) must be done with a single rounding, so a fused multiply-add (FMA) must be utilized.
The following program demonstrates the successful recomputation of the dividend under the restrictions stated. Note that achieving the intended functionality of fdiv_rz()
requires configuring the compiler for the strictest IEEE-754 adherence. Otherwise, the compiler may consider the division operation as independent of the fesetround()
calls, causing the order of operations to be changed such that the division is no longer sandwiched between the fesetround()
calls as needed. From what I can see, this affects gcc in particular, whereas clang seems to be better behaved by default. I used the Intel compiler to build this code, which allows easy control of floating-point behavior. Specifically, I compiled with icl /Qstd=c++11 /Ox /QxHOST /W4 /fp:strict fdiv_fmod.cpp
.
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <math.h>
#include <fenv.h>
#define N (2000000000) // number of random test cases
/* divide a by b, rounding result towards zero */
double fdiv_rz (double a, double b)
{
double quot;
int curr_mode = fegetround ();
fesetround (FE_TOWARDZERO);
quot = a / b;
fesetround (curr_mode);
return quot;
}
/* reinterpret 64-bit unsigned integer as an IEEE-754 binary64 */
double uint64_as_double (uint64_t a)
{
double r;
memcpy (&r, &a, sizeof r);
return r;
}
/*
https://groups.google.com/forum/#!original/comp.lang.c/qFv18ql_WlU/IK8KGZZFJx4J
From: geo <gmars...@gmail.com>
Newsgroups: sci.math,comp.lang.c,comp.lang.fortran
Subject: 64-bit KISS RNGs
Date: Sat, 28 Feb 2009 04:30:48 -0800 (PST)
This 64-bit KISS RNG has three components, each nearly
good enough to serve alone. The components are:
Multiply-With-Carry (MWC), period (2^121+2^63-1)
Xorshift (XSH), period 2^64-1
Congruential (CNG), period 2^64
*/
static uint64_t kiss64_x = 1234567890987654321ULL;
static uint64_t kiss64_c = 123456123456123456ULL;
static uint64_t kiss64_y = 362436362436362436ULL;
static uint64_t kiss64_z = 1066149217761810ULL;
static uint64_t kiss64_t;
#define MWC64 (kiss64_t = (kiss64_x << 58) + kiss64_c, \
kiss64_c = (kiss64_x >> 6), kiss64_x += kiss64_t, \
kiss64_c += (kiss64_x < kiss64_t), kiss64_x)
#define XSH64 (kiss64_y ^= (kiss64_y << 13), kiss64_y ^= (kiss64_y >> 17), \
kiss64_y ^= (kiss64_y << 43))
#define CNG64 (kiss64_z = 6906969069ULL * kiss64_z + 1234567ULL)
#define KISS64 (MWC64 + XSH64 + CNG64)
int main (void)
{
double a, b, t, quot, rem;
int expa, expb;
for (int i = 0; i < N; i++) {
do {
a = uint64_as_double (KISS64);
} while (isinf (a) || isnan (a));
(void)frexp (a, &expa);
do {
b = uint64_as_double (KISS64);
(void)frexp (b, &expb);
} while ((b == 0) || isinf (b) || isnan (b) || (abs (expa - expb) > 52));
quot = trunc (fdiv_rz (a, b));
rem = fmod (a, b);
t = (fabs (quot) < 1.0) ? rem : fma (quot, b, rem);
if (t != a) {
printf ("a=% 23.13a b=% 23.13a quot=% 23.13a rem=%23.13a t=% 23.13a\n",
a, b, quot, rem, t);
return EXIT_FAILURE;
}
}
printf ("test passed\n");
return EXIT_SUCCESS;
}