3

I am trying to calculate the product of two 100-bit numbers. It is supposed to mimic the behavior of multiplication of unsigned integers native to 100-bit CPU architecture. That is, the program must calculate the actual product, modulo 2^100.

To do this QUICKLY, I have opted to implement 100bit numbers as uint64_t[2], a two element array of 64bit numbers. More precisely, x = 2^64 * a + b. I need to quickly perform arithmetic and logical operations (products, bit shifts, bit rotate, xor etc). I have chosen this representation because it allows me to use the fast, native operations on the 64bit constituents. For example, rotating a 128bit 'number' is only twice as slow as rotating a 64bit int. Boost::128bit is MUCH slower and bitset and valarray don't have arithmetic. I COULD use the arrays for all operations except multiplication, and then convert the arrays to say boost:128bit and then just multiply, but that is a last resort and probably slow as hell.

I have tried to following. Let us have two such pairs of 64bit numbers, say 2^64 a + b and 2^64 x + y. Then the product can be expressed as

2^128 ax + 2^64 (ay + bx) + by

We may ignore the first term, for it is too large. It would be almost sufficient to take the pair

ay + bx, by

to be our answer, but the more significant half is 'missing' the overflow from the b*y operation. I don't know how to calculate this without breaking the numbers b,y into four different 32bits, and using a divide and conquer approach that will ensure the expanded terms of the product each don't overflow.

This is for a 'chess engine' with magic multiplication hashing on a 10x10 board

basket
  • 181
  • 6

2 Answers2

1

You only care about the most significant 32 bits of each number in b * y for the overflow it might produce:

struct Num {
   uint64_t low;
   uint64_t high;

   Num &operator*=(const Num &o) {
      high = low          *  o.high + 
             high         *  o.low  + 
             (low >> 32u) * (o.low >> 32u); // <- handles overflow
      low  *= o.low;
      high &= 0xFFFFFFFFF; // keeping number 100 bits
      return *this;
   }
};

See if your cpu supports any native 128 bit ints, because that would be optimal (though not portable).

Good luck with your chess engine!

Kostas
  • 4,061
  • 1
  • 14
  • 32
  • Thank you! Before I could try your code I looked into __uint128_t which either my IDE or compiler supports. It seems to be just a hair faster than my own implementation (4.8 s vs 5.0 seconds to perform 10**9 bit rotations). I think I will just use this instead, since my code will only run on modern PC's anyway – basket Jul 21 '19 at 02:42
  • @basket Glad to help. I worked in a small chess engine before, so I’d be happy to work with you if your looking for a co – Kostas Jul 21 '19 at 02:44
  • 1
    I'm actually writing something similar to alpha zero but for Game of the Amazons. I proved the concept with checkers but the engine was written in python so data(game) generation bottle necked too hard. So now I'm writing an engine in C++ (I also must find new magics for move tables). – basket Jul 21 '19 at 02:52
  • 1
    I think the engine writing should go fairly smoothly, for there is so much material online. The next hurdle seems to be incorporating the network. I don't have 4 Next Gen TPU's so the inference service can't keep up with the tree search, only about 1000 every second or so, it manages. I think I will use a fast/conventional eval and augment it using policies and more accurate evaluations from the network. I even wrote 'novel' architecture for amazons specifically – basket Jul 21 '19 at 02:55
  • @basket Sounds great. Best of luck! – Kostas Jul 21 '19 at 15:43
  • This looks promising for a fast hash with something closer to `by` than the lower part thereof. **It is *not* the exact result** - consider 4 digit decimals: 0019×0028. The product of the lower halves is 532, the product of 1×2 is just 2. See [Getting the high part of 64 bit integer multiplication](https://stackoverflow.com/a/28904636). – greybeard Jul 25 '19 at 11:18
0

Come to think of it and borrowing basket's notation:
hell bent on 100 bits, the error would be smaller using 64 bits of high and only 36 of low:
you can compute the most significant 64 bits of "low×low" using (low >> 4u) * (o.low >> 4u), using the upper 36 bits of this as an overflow to high. With no effort to coin names for magic literals:

Bits100 &operator*=(const Bits100 &o) {
    high = low        *  o.high +            // ignore high * o.high
           high       *  o.low  + 
          (low >> 4u) * (o.low >> 4u) >> 28; // handles overflow in most cases
    low  = low * o.low & 0xFFFFFFFFF;        // keep low to 100-64 bits
    return *this;
}
greybeard
  • 2,249
  • 8
  • 30
  • 66