Multiplication with very large operands

Question

I am implementing a multi-precision module, and at this moment I am stuck in the multiplication.

To do my algorithm I need to multiply two unsigned operands of 64 bits, using a Haswell microarchitecture, and store the result in a memory block. I'm doing an implementation using 'g++' and another more efficient using 'icpc'.

int main(){

    //Operands
    size_t a = 10000000000000000000 //Fit in 8 bytes
           b = 7;

    //To store the result;
    size_t dst[2];

    //Multiplication here... (Note that the multiplication result don't fit in 64bits. So, I need to save the result in two memory positions)

    dst[0] = //Store the less significative half..
    dst[1] = //Store the more significative half..


    //My function
    print_To_Screen(dst);
}

I don't know how to access to each half of the result to store them in the memory blocks that I want. Am I obligde to use assembly instructions to the multiplication and them to store the result, or exists an easy way?

Most compilers support 128 bit integers but you forgot to mention which one you use. — Jester, Feb 04 '16 at 13:17
I'm doing an implementation using 'g++' and another more efficient using 'icpc'. — Hélder Gonçalves, Feb 04 '16 at 13:21
[`g++` supports `__int128`](https://gcc.gnu.org/onlinedocs/gcc/_005f_005fint128.html) — Jester, Feb 04 '16 at 13:31
gcc makes [somewhat clunky code on x86 when it has to do a "full multiply"](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51837) (e.g. 64*64->128b). You often end up with extra `mov` instructions, or saving/restoring a register it didn't even use. It bloats the code, but there should barely be any slowdown other than increased I-cache / uop-cache pressure. Haswell has zero-latency reg-reg moves (handled in register renaming) so the extra `mov`s don't make dependency chains worse. clang makes nicer code. They both do a good job of not using any extra `mul` instructions, though. — Peter Cordes, Feb 05 '16 at 02:53

score 4 · Accepted Answer · answered Feb 04 '16 at 13:55

4

Just use __int128 as suggested, most compilers support it :

__uint128_t mul64x64( uint64_t a, uint64_t b ) {
    return ((__uint128_t)a) * ((__uint128_t)a);
}

This will translate to a single instruction multiplication on x64 architectures.

answered Feb 04 '16 at 13:55

ElderBug

5,926
16
25

Then I use: `dst[0] = (uint64_t)k; dst[1] = (uint64_t)(k>>64);` Thank you. – Hélder Gonçalves Feb 04 '16 at 14:17

score 0 · Answer 2 · answered Feb 04 '16 at 13:41

0

It is the high qword that you will have trouble computing:

(a*2^32 + b) * (c*2^32 + d)

= a*2^32(c*2^32 + d) + b(c*2^32 + d)

= a*c*2^64 + (ad + bc)*2^32 + bd

The term in bold gives you the part of the product that you will be unable to represent in a 64-bit value and will be lost.

answered Feb 04 '16 at 13:41

Of course I cannot save the 128 bit result in 64bits. That's why I searching a solution to this problem. But do you know a solution to this problem? – Hélder Gonçalves Feb 04 '16 at 13:59
1

Not only the `ac` term: the 64bit high part is `high := ac + int((ad + bc) / 2^32)`. For completness the lower part is: `low := ((ad + bc) mod 2^32) * 2^32 + bd`, then the resulting 128bit value is: `high * 2^64 + low`. – BeyelerStudios Feb 04 '16 at 14:09
You are right, but at first look, it is not easy understand what you tried to say. – Hélder Gonçalves Feb 04 '16 at 14:43

Multiplication with very large operands

2 Answers2