I need to scale big integers (several hundred bits) by a double. In particular, I need to compute
(M * factor) mod M
where M is a big integer and factor is a double. I won't be using any libraries unless you want to call a dozen lines of code in a header file a 'library'; hence big float math is not an option here.
Knuth and the GMP/MPIR source code had no answers, and here I found only Multiplication between big integers and doubles which is not really applicable, since second answer is too exotic and the first loses too much precision.
Working from first principles and simulating big integers with a uint64_t I came up with this (runnable with 64-bit VC++ or gcc/MinGW64):
#include <cassert>
#include <cfloat>
#include <climits>
#include <cmath>
#include <cstdint>
#include <cstdio>
#include <intrin.h> // VC++, MinGW
#define PX(format,expression) std::printf("\n%35s == " format, #expression, expression);
typedef std::uint64_t limb_t;
// precision will be the lower of LIMB_BITS and DBL_MANT_DIG
enum { LIMB_BITS = sizeof(limb_t) * CHAR_BIT };
// simulate (M * factor) mod M with a 'big integer' M consisting of a single limb
void test_mod_mul (limb_t modulus, double factor)
{
assert( factor >= 0 );
// extract the fractional part of the factor and discard the integer portion
double ignored_integer_part;
double fraction = std::modf(factor, &ignored_integer_part);
// extract the significand (aligned at the upper end of the limb) and the exponent
int exponent;
limb_t significand = limb_t(std::ldexp(std::frexp(fraction, &exponent), LIMB_BITS));
// multiply modulus and single-limb significand; the product will have (n + 1) limbs
limb_t hi;
/* limb_t lo = */_umul128(modulus, significand, &hi);
// The result comprises at most n upper limbs of the product; the lowest limb will be
// discarded in any case, and potentially more. Factors >= 1 could be handled as well,
// by dropping the modf() and handling exponents > 0 via left shift.
limb_t result = hi;
if (exponent)
{
assert( exponent < 0 );
result >>= -exponent;
}
PX("%014llX", result);
PX("%014llX", limb_t(double(modulus) * fraction));
}
int main ()
{
limb_t const M = 0x123456789ABCDEull; // <= 53 bits (for checking with doubles)
test_mod_mul(M, 1 - DBL_EPSILON);
test_mod_mul(M, 0.005615234375);
test_mod_mul(M, 9.005615234375);
test_mod_mul(M, std::ldexp(1, -16));
test_mod_mul(M, std::ldexp(1, -32));
test_mod_mul(M, std::ldexp(1, -52));
}
The multiplication and shift will be done with big integer math in my app, but the principle should be the same.
Is the basic approach correct or is the test working only because I'm testing with toy integers here? I don't know the first thing about floating point math, and I picked the functions from a C++ reference.
Clarification: everything from the multiplication onward will be done with (partial) big integer math; here I'm only using limb_t for this purpose in order to get a tiny toy program which can be posted and that actually runs. The final app will be using the moral equivalent of GMP's mpn_mul_1() and mpn_rshift().