2

I want to multiply two integers modulo another (prime) integer (approximately 50 bit in length) in C.

uint64_t x = ...;
uint64_t y = ...;
uint64_t q = ...;
uint64_t z = (x*y) % q;

This would produce a 64 bit integer smaller than q but unfortunately x%q * y%q is way too large for a 64 bit type. So it overflows before I can calculate the remainder.

I thought about some workarounds:

  • using gcc's __uint128_t would violate the C standard and wouldn't work with other compilers
  • cast variables to double or long double but the result wouldn't be correct due to floating point inaccuracy
  • using a multiple precision library like GMP would work but that introduces a new project dependency and maybe comes with some overhead
  • writing an own multiplication function for that purpose, but as far as I know I would need assembly, so portability is gone

Are there other options or what would you recommend?

EDIT: I forgot to mention, that a self written solution is only worth doing if it is efficient. For example doing dozens of inefficient % operations would be bad.

firefexx
  • 666
  • 6
  • 16
  • 2
    Try [this](http://stackoverflow.com/questions/2566010/fastest-way-to-calculate-a-128-bit-integer-modulo-a-64-bit-integer?rq=1). You can do 128 bit multiplication easily then use the above 128-by-64 bit modulus – phuclv Apr 25 '14 at 14:55

3 Answers3

2

Of course you don't need assembly to implement arbitrary-precision multiplication yourself.

There are plenty of well-known algorithms for this, and (by definition) they will be possible to implement in C.

See Wikipedia for a lot more information. In my experience, getting code like that right can be a bit tricky, so perhaps your time is better spent by adding the dependency.

unwind
  • 391,730
  • 64
  • 469
  • 606
0

In your particular case it would be best to divide the numbers to four 13bit parts and multiply them using long hand method, calculating modulo on the intermediate results.

The code would look like this:

uint64_t x = ...;
uint64_t y = ...;
uint64_t q = ...; // at most 50 bits
uint64_t z = 0;
x %= q;
y %= q;
while (1)
{
  z += x*(y & 0b1111111111111);
  z %= q;
  y >>= 13;
  if (y==0) break;
  x <<= 13;
  x %= q;
}

This would require 4 multiplications and 9 modulos only.

mik
  • 3,575
  • 3
  • 20
  • 29
0

Maybe this feature can be useful. I haven't tested if it's efficient but it should work and it only uses a limited number of modulo operations.

uint64_t XY_modulo_D_L_N(uint64_t X, uint64_t Y, uint64_t D)
{
    // return (X * Y) modulo D
    //tested for D < 2^62
    uint64_t D_max= (uint64_t)1 << 31;
    uint64_t mod_D;
    X %= D;
    Y %= D;
    if (X <= D_max && Y <= D_max)
    {
        mod_D = (X * Y) % D;
    }
    else if (X <= D_max || Y <= D_max )
    {
        uint64_t N_t;
        uint64_t RD = D % D_max;
        uint64_t KD = D / D_max;
        if (Y <= D_max)
        {
            uint64_t X1 = Y;
            Y = X;
            X = X1;
        }
        mod_D = (X * (Y % D_max)) % D_max;
        N_t = X * (Y / D_max) + (X * (Y % D_max)) / D_max;
        mod_D = (mod_D + (D_max *(N_t % KD)) % D) % D;
        mod_D = (mod_D + D -((N_t / KD) * RD) % D) % D;
    }
    else
    {
        uint64_t RD = D % D_max;
        uint64_t KD = D / D_max;
        uint64_t Rx = X % D_max;
        uint64_t Ry = Y % D_max;
        uint64_t Kx = X / D_max;
        uint64_t Ky = Y / D_max;
        uint64_t mod_D_t,N_t,R_t;

        mod_D =(Rx * Ry) % D;
        mod_D_t = (D_max * D_max) % D;
        N_t = mod_D_t;
        R_t = Kx;
        if (N_t >= (D_max / R_t))
        {
            mod_D_t = (R_t *(N_t % D_max)) % D_max;
            N_t = R_t * (N_t / D_max)+(R_t * (N_t % D_max)) / D_max;
            mod_D_t = (mod_D_t + (D_max * (N_t % KD)) % D) % D;
            N_t = (mod_D_t + D - ((N_t / KD) * RD) % D) % D;
        }
        else
            N_t *= R_t;
        R_t = Ky;
        if (N_t >= (D_max / R_t))
        {
            mod_D_t = (R_t * (N_t % D_max)) % D_max;
            N_t = R_t * (N_t / D_max) + (R_t * (N_t % D_max)) / D_max;
            mod_D_t = (mod_D_t + (D_max * (N_t % KD)) % D) % D;
            mod_D = (mod_D + (mod_D_t + D - ((N_t / KD) * RD) % D) % D) % D;
        }
        else
            mod_D =(mod_D + N_t * R_t) % D;
        N_t = (Rx * D_max) % D;
        R_t = Ky;
        if (N_t >= (D_max / R_t))
        {
            mod_D_t = (R_t * (N_t % D_max)) % D_max;
            N_t = R_t * (N_t / D_max) + (R_t * (N_t % D_max)) / D_max;
            mod_D_t = (mod_D_t + (D_max * (N_t % KD)) % D) % D;
            mod_D = (mod_D + (mod_D_t + D - ((N_t / KD) * RD) % D) % D) % D;
        }
        else
            mod_D = (mod_D + N_t * R_t) % D;
        N_t = (Ry * D_max) % D;
        R_t = Kx;
        if (N_t >= (D_max / R_t))
        {
            mod_D_t = (R_t * (N_t % D_max)) % D_max;
            N_t=R_t * (N_t / D_max) + (R_t * (N_t % D_max)) / D_max;
            mod_D_t = (mod_D_t + (D_max * (N_t % KD)) % D) % D;
            mod_D = (mod_D + (mod_D_t + D - ((N_t / KD) * RD) % D) % D) % D;
        }
        else
            mod_D = (mod_D + N_t * R_t) % D;
    }
    return mod_D;
}
user140242
  • 159
  • 6