3

I am implementing a multi-precision module, and at this moment I am stuck in the multiplication.

To do my algorithm I need to multiply two unsigned operands of 64 bits, using a Haswell microarchitecture, and store the result in a memory block. I'm doing an implementation using 'g++' and another more efficient using 'icpc'.

int main(){

    //Operands
    size_t a = 10000000000000000000 //Fit in 8 bytes
           b = 7;

    //To store the result;
    size_t dst[2];

    //Multiplication here... (Note that the multiplication result don't fit in 64bits. So, I need to save the result in two memory positions)

    dst[0] = //Store the less significative half..
    dst[1] = //Store the more significative half..


    //My function
    print_To_Screen(dst);
}

I don't know how to access to each half of the result to store them in the memory blocks that I want. Am I obligde to use assembly instructions to the multiplication and them to store the result, or exists an easy way?

  • Most compilers support 128 bit integers but you forgot to mention which one you use. – Jester Feb 04 '16 at 13:17
  • I'm doing an implementation using 'g++' and another more efficient using 'icpc'. – Hélder Gonçalves Feb 04 '16 at 13:21
  • 3
    [`g++` supports `__int128`](https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html) – Jester Feb 04 '16 at 13:31
  • That works. Thank you. – Hélder Gonçalves Feb 04 '16 at 14:18
  • 1
    gcc makes [somewhat clunky code on x86 when it has to do a "full multiply"](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51837) (e.g. 64*64->128b). You often end up with extra `mov` instructions, or saving/restoring a register it didn't even use. It bloats the code, but there should barely be any slowdown other than increased I-cache / uop-cache pressure. Haswell has zero-latency reg-reg moves (handled in register renaming) so the extra `mov`s don't make dependency chains worse. clang makes nicer code. They both do a good job of not using any extra `mul` instructions, though. – Peter Cordes Feb 05 '16 at 02:53

2 Answers2

4

Just use __int128 as suggested, most compilers support it :

__uint128_t mul64x64( uint64_t a, uint64_t b ) {
    return ((__uint128_t)a) * ((__uint128_t)a);
}

This will translate to a single instruction multiplication on x64 architectures.

ElderBug
  • 5,926
  • 16
  • 25
0

It is the high qword that you will have trouble computing:

(a*2^32 + b) * (c*2^32 + d)

= a*2^32(c*2^32 + d) + b(c*2^32 + d)

= a*c*2^64 + (ad + bc)*2^32 + bd

The term in bold gives you the part of the product that you will be unable to represent in a 64-bit value and will be lost.

  • Of course I cannot save the 128 bit result in 64bits. That's why I searching a solution to this problem. But do you know a solution to this problem? – Hélder Gonçalves Feb 04 '16 at 13:59
  • 1
    Not only the `ac` term: the 64bit high part is `high := ac + int((ad + bc) / 2^32)`. For completness the lower part is: `low := ((ad + bc) mod 2^32) * 2^32 + bd`, then the resulting 128bit value is: `high * 2^64 + low`. – BeyelerStudios Feb 04 '16 at 14:09
  • You are right, but at first look, it is not easy understand what you tried to say. – Hélder Gonçalves Feb 04 '16 at 14:43