I am trying to calculate the product of two 100-bit numbers. It is supposed to mimic the behavior of multiplication of unsigned integers native to 100-bit CPU architecture. That is, the program must calculate the actual product, modulo 2^100.
To do this QUICKLY, I have opted to implement 100bit numbers as uint64_t[2], a two element array of 64bit numbers. More precisely, x = 2^64 * a + b. I need to quickly perform arithmetic and logical operations (products, bit shifts, bit rotate, xor etc). I have chosen this representation because it allows me to use the fast, native operations on the 64bit constituents. For example, rotating a 128bit 'number' is only twice as slow as rotating a 64bit int. Boost::128bit is MUCH slower and bitset and valarray don't have arithmetic. I COULD use the arrays for all operations except multiplication, and then convert the arrays to say boost:128bit and then just multiply, but that is a last resort and probably slow as hell.
I have tried to following. Let us have two such pairs of 64bit numbers, say 2^64 a + b and 2^64 x + y. Then the product can be expressed as
2^128 ax + 2^64 (ay + bx) + by
We may ignore the first term, for it is too large. It would be almost sufficient to take the pair
ay + bx, by
to be our answer, but the more significant half is 'missing' the overflow from the b*y operation. I don't know how to calculate this without breaking the numbers b,y into four different 32bits, and using a divide and conquer approach that will ensure the expanded terms of the product each don't overflow.
This is for a 'chess engine' with magic multiplication hashing on a 10x10 board