Scaling a big integer by a double

Question

I need to scale big integers (several hundred bits) by a double. In particular, I need to compute

(M * factor) mod M

where M is a big integer and factor is a double. I won't be using any libraries unless you want to call a dozen lines of code in a header file a 'library'; hence big float math is not an option here.

Knuth and the GMP/MPIR source code had no answers, and here I found only Multiplication between big integers and doubles which is not really applicable, since second answer is too exotic and the first loses too much precision.

Working from first principles and simulating big integers with a uint64_t I came up with this (runnable with 64-bit VC++ or gcc/MinGW64):

#include <cassert>
#include <cfloat>
#include <climits>
#include <cmath>
#include <cstdint>
#include <cstdio>

#include <intrin.h>   // VC++, MinGW

#define PX(format,expression)  std::printf("\n%35s  ==  " format, #expression, expression);

typedef std::uint64_t limb_t;

// precision will be the lower of LIMB_BITS and DBL_MANT_DIG
enum {  LIMB_BITS = sizeof(limb_t) * CHAR_BIT  };

// simulate (M * factor) mod M with a 'big integer' M consisting of a single limb
void test_mod_mul (limb_t modulus, double factor)
{
   assert( factor >= 0 );

   // extract the fractional part of the factor and discard the integer portion

   double ignored_integer_part;
   double fraction = std::modf(factor, &ignored_integer_part);

   // extract the significand (aligned at the upper end of the limb) and the exponent

   int exponent;
   limb_t significand = limb_t(std::ldexp(std::frexp(fraction, &exponent), LIMB_BITS));

   // multiply modulus and single-limb significand; the product will have (n + 1) limbs

   limb_t hi;
/* limb_t lo = */_umul128(modulus, significand, &hi);

   // The result comprises at most n upper limbs of the product; the lowest limb will be 
   // discarded in any case, and potentially more. Factors >= 1 could be handled as well,
   // by dropping the modf() and handling exponents > 0 via left shift.

   limb_t result = hi;

   if (exponent)
   {
      assert( exponent < 0 );

      result >>= -exponent;
   }

   PX("%014llX", result);
   PX("%014llX", limb_t(double(modulus) * fraction));
}

int main ()
{
   limb_t const M = 0x123456789ABCDEull;  // <= 53 bits (for checking with doubles)

   test_mod_mul(M, 1 - DBL_EPSILON);
   test_mod_mul(M, 0.005615234375);
   test_mod_mul(M, 9.005615234375);
   test_mod_mul(M, std::ldexp(1, -16));
   test_mod_mul(M, std::ldexp(1, -32));
   test_mod_mul(M, std::ldexp(1, -52));
}

The multiplication and shift will be done with big integer math in my app, but the principle should be the same.

Is the basic approach correct or is the test working only because I'm testing with toy integers here? I don't know the first thing about floating point math, and I picked the functions from a C++ reference.

Clarification: everything from the multiplication onward will be done with (partial) big integer math; here I'm only using limb_t for this purpose in order to get a tiny toy program which can be posted and that actually runs. The final app will be using the moral equivalent of GMP's mpn_mul_1() and mpn_rshift().

I'm not entirely sure if the question makes sense. Suppose `M` is something like 17, and `factor` is 10^500. Then the precision of a `double` isn't high enough have significant units, so that there's no hope to compute the correct answer. — Kerrek SB, Oct 18 '14 at 14:15
For the problem under consideration - (M * f) mod M - results must lie between 0 and (M - 1). If someone passes 10^500 then they will get the precision they deserve. That said, with the changes indicated - i.e. losing the modf() call and handling the case of positive exponents being returned from frexp() - the big integer should be scaled up correctly. (Note: the part from "limb_t hi" on will be big integer math). — DarthGizka, Oct 18 '14 at 14:56

score 4 · Accepted Answer · edited May 23 '17 at 10:33

4

A floating point number is nothing but a product of three terms. The three terms a sign, a significand (sometimes called mantissa), and an exponent. The value of these three terms is computed as

(-1)^sign * significand * base^exponent

The base is normally 2 although the C++ standard doesn't guarantee that. Correspondingly, your computation becomes

(M * factor) mod M

== (M * (-1)^sign * significand * base^exponent) mod M

== ((-1)^{sign(M) + sign} * abs(M) * significand * base^exponent) mod M

Computing the sign of the result should be rather trivial. Computing X * base^exponent is rather straight forward to: it is either a suitable shift of bits if the base is 2 or a multiplication with/division by the power of base (left shift or multiplication for positive exponent, right shift or division for negative exponent). Assuming your big integer representation supports the modulus operation already, the only interesting term is the multiplication of abs(M) * significand but this is just a normal integer multiplication, although for your big integer representation. I haven't checked too closely but I think this is what the first answer you linked to does (the one you described as "too exotic").

The remaining bit is to extract sign, significand, and exponent from a double. The sign can be determine easily by a comparison with 0.0 and the significand and the exponent can be obtained using frexp() (see this answer for example). The significand is returned as a double, i.e., you probably want to multiply it by 2^{std::numeric_limits<double>::digits} and adjust the exponent appropriately (I haven't done this in a while, i.e., I'm not entirely sure about the extact contract of frexp()).

To answer your question: I'm not familiar with the GMP operations you are using but I think the operations you do indeed execute the computation described above.

edited May 23 '17 at 10:33

Community

1
1

answered Oct 18 '14 at 14:56

Dietmar Kühl

150,225
13
225
380

frexp() returns the normalised significand and the appropriate exponent for base 2; i.e. 0.5 will be returned unchanged and exponent will be 0. Here it is used to extract all significant bits from the factor. I added the modf() call with the intent of avoiding big integer modulus operations; the rest of my app only uses small (limb-sized) moduli with big integers which is trivial to implement, compared to full big integer division. – DarthGizka Oct 18 '14 at 15:08
I think that it should be noted that `double` doesn't have a fixed and well defined size according to the standard. it's usually 64 bits, but there are other cases where a `double` can be larger, I have not encountered cases where a `double` is less then `64` bits, but the latter is still a possibility according to the standard although there are basically no architectures implementing a `double` with less than 64 bits to my knowledge . You have to check the size of your `double`s and `float`s because that size will impact the precision of your floating point representation . – user2485710 Oct 18 '14 at 15:14
@DarthGizka: yes, the significand is returned as a fraction - this is, however, what you don't want! It is needed as an integer, i.e., you'd scale it have no non-zero fractional bits and convert to a suitable integer. When multiplying the significand the exponent needs to be adjusted correspondingly. I noticed I didn't quite answer your question (added): I think the computations you execute do exactly the computation described above. – Dietmar Kühl Oct 18 '14 at 15:15
@ user2485710: // precision will be the lower of LIMB_BITS and DBL_MANT_DIG – DarthGizka Oct 18 '14 at 15:16
@DarthGizka you probably want to switch to the `` header file and use std::numeric_limits http://en.cppreference.com/w/cpp/types/numeric_limits , take a look at the member constants and member functions, it will be more idiomatic for a C++ user . – user2485710 Oct 18 '14 at 15:24
@ Dietmar Kühl: yes, frexp() returns the bits as a normalised fraction in the range [0.5, 1). But the fraction is passed to ldexp() with LIMB_BITS as exponent in order to push the desired bits into the integer portion, which is then extracted by casting the result to limb_t. By the way, thanks for clarifying the math. Basically, my fingers sort of know what they're doing but my brain fails to come up with a proper explanation of why the stuff works, which is why I don't know how far the stuff my fingers typed can be trusted. – DarthGizka Oct 18 '14 at 15:24
@ Dietmar Kühl: I think the process can be stated like this, using your math as foundation. frexp() gives us the base-2 exponent and a fraction where the most significant bit is nonzero (unless the factor is zero). ldexp() pushes a desired number SIG_BITS of bits into the integer part of the double, ready to be extracted into an integer SIG which should be large enough to hold all significant bits. After multiplying the big integer with the small SIG it must be left shifted by (exponent - SIG_BITS), with negative meaning right shift. By dropping the low limb I implicitly shifted right by 64. – DarthGizka Oct 18 '14 at 15:56

Scaling a big integer by a double

1 Answers1