1

I need to get the amount of overflows done while multiplying uint64_t on uint64_t. This can be calculated by the formula below, but I don't have access to 128 bit integers.

The example pseudocode with the usage of uint128_t:

uint64_t x = ...
uint64_t y = ...
uint128_t result = (uint128_t)x * (uint128_t)y / ((uint128_t)1 << 64);

Also it can be implemented by splitting uint128_t to [uint64_t, uint64_t] and performing the calculations from example. I consider not using it as I afraid it is pretty slow.

anatolyg
  • 26,506
  • 9
  • 60
  • 134
Alex S.
  • 315
  • 3
  • 12
  • Do you want to calculate `result` or "the amount of overflows done"? – anatolyg Jul 07 '20 at 15:37
  • what is "amount of overflows" ? the diff between the real result (without overflow / on at least 128bits) and the result with an overflow ? but which overflow ? the effect of an overflow is not specified by C – bruno Jul 07 '20 at 15:43
  • "amount of overflows" is the amount of times when uint64 exceeded over 2**64 and was reset to 0. – Alex S. Jul 07 '20 at 15:45
  • So the "amount of overflows" is either 0 or 1, right? – anatolyg Jul 07 '20 at 15:46
  • 1
    BTW `(UINT64_MAX + 1)` can be 0 because of the overflow on `uint64_t`, very probably `UINT64_MAX` is defined as a `uint64_t`, not good to divide by 0 ... – bruno Jul 07 '20 at 15:46
  • @anatolyg, "amount of overflows" can be up to (2^64 * 2^64)/2^64 = 2^64. – Alex S. Jul 07 '20 at 15:49
  • `uint64_t * uint64_t` produces a `uint64_t` not a `uint128_t` and as I said `(UINT64_MAX + 1)` is very probably 0 (even not sure), what result do you expect ? – bruno Jul 07 '20 at 15:51
  • by pity can you use variables somewhere in `uint128_t(uint64_t) * uint128_t(uint64_t)` to make the code valid in C ^^ – bruno Jul 07 '20 at 15:55
  • It is a pseudocode, I used these statements to make the example look clear and easy to understand. – Alex S. Jul 07 '20 at 15:56
  • @AlexanderSadovskyi `uint128_t(uint64_t) * uint128_t(uint64_t)` is not clear at all ... to add a line declaring the two variables just need 1 line allowing to use them in your expression – bruno Jul 07 '20 at 15:57
  • Oh, I think I get it. You just want to calculate the mathematical formula, without regard to what its significance is. It just happens to be the number of overflows which occur when applying some hypothetical 64x64 multiplication algorithm. Is this correct? – anatolyg Jul 07 '20 at 16:02
  • "Is this correct?" Exactly. – Alex S. Jul 07 '20 at 16:03
  • 1
    I updated the code to be more C-like. I have never used `uint128_t` but I think this code is now mostly runnable by a compiler which has it. – anatolyg Jul 07 '20 at 16:06
  • Can you elaborate on what machine or compiler features you do or don't have, seeing as you've already rejected a few? If you really have nothing beyond ISO C then I don't think there's anything you can do other than doing the multiple-precision multiplication in 64-bit chunks; is that what you refer to with the last paragraph? By my count it needs 4 multiplies and a few cheap shifts and adds; doesn't sound that horrible to me. – Nate Eldredge Jul 07 '20 at 16:19
  • Don't familiar with multi-precision multiplication. "4 multiplies and a few cheap shifts and adds" - If it is, then how can I implement it? If it wouldn't be hard for you, could you please write code and post it as an answer? – Alex S. Jul 07 '20 at 16:30
  • 1
    I see anatolyg beat me to it. – Nate Eldredge Jul 07 '20 at 16:41

3 Answers3

2

Intel 64-bit architecture has an instruction which multiplies two 64-bit numbers, with 128-bit result. To access it, you have the _mulx_u64 intrinsic.

After the multiplication, you just get the high 64-bit part as your result, and throw away the low part.

anatolyg
  • 26,506
  • 9
  • 60
  • 134
2

To make a "clean" (portable) solution, do multiplication of 32-bit halves.

Using mathematical notation:

x = x1 * 2^32 + x0
y = y1 * 2^32 + y0

then

x * y = x1 * y1 * 2^64 + (x1 * y0 + x0 * y1) * 2^32 + x0 * y0

Here, all xn * yn fit into 64 bits. The only tricky part is doing the additions without losing the overflow (carry) bits.

For that, you can use the following code, which checks whether an overflow occurs when adding two 64-bit numbers.

bool overflow(uint64_t x, uint64_t y)
{
    return x + y < x;
}

Here is code which calculates the high 64-bit part of 128-bit multiplication:

uint64_t doit(uint64_t x, uint64_t y)
{
    // Calculate 64-bit parts of the 128-bit result
    uint64_t x1 = x >> 32;
    uint64_t x0 = x << 32 >> 32;
    uint64_t y1 = y >> 32;
    uint64_t y0 = y << 32 >> 32;
    uint64_t part0 = x0 * y0;
    uint64_t part1 = x1 * y0;
    uint64_t part2 = x0 * y1;
    uint64_t part3 = x1 * y1;
    /// Use part3
    uint64_t result = part3;
    // Use the 32-bit high halves of part1 and part2
    result += part1 >> 32;
    result += part2 >> 32;
    // Throw away their high half; multiply by 2^32
    part1 <<= 32;
    part2 <<= 32;
    // Calculate the 65-bit sum of parts 1 and 2
    bool carry = overflow(part1, part2)
    result += carry;
    uint64_t temp = part1 + part2;
    // Use part0
    carry = overflow(temp, part0)
    result += carry;
    return result;
}
anatolyg
  • 26,506
  • 9
  • 60
  • 134
  • I was writing my own function similar to this, so here is a godbolt link with both, plus the `__int128` for verification: https://godbolt.org/z/j5H-qm Unfortunately I think yours has a bug somewhere as it gives different answers. – Nate Eldredge Jul 07 '20 at 16:47
  • Fixed; the code works now. Interesting perspective on that godbolt link! I wonder if it's possible to convince the compiler to produce code with a single multiplication instruction. – anatolyg Jul 07 '20 at 16:59
  • Mine had some bugs too ;-) Here is a new link with everything fixed, I hope. https://godbolt.org/z/SxvoaW – Nate Eldredge Jul 07 '20 at 17:16
  • Look OK, yet the steps to find `overflow()` could be done with 32-bit args – chux - Reinstate Monica Jul 07 '20 at 17:46
  • Perhaps after `result += part2 >> 32;`, simplify to `return result + ((part2 & UINT32_MAX) + (part1 & UINT32_MAX) + (part0 >> 32)) >> 32;`? – chux - Reinstate Monica Jul 07 '20 at 18:01
  • It's a matter of personal taste. If you think the code is obvious, you can do it one-liner-style. Considering the context of this code, I cannot assume the code is trivial. – anatolyg Jul 07 '20 at 18:08
0

Let me start by saying there is no portable way of doing this, because there is no int128_t in the C++20 standard or below.

With that in mind, in Windows you can achieve this by using the MultiplyExtract128 function. Use 64 as the shift parameter to divide by 2^64.

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • 2
    Sure it can be done portably - you emulate the 128-bit multiply with 64-bit operations, e.g. via grade-school long multiplication. – Nate Eldredge Jul 07 '20 at 16:13
  • Thanks your answer, but the question was about C, not C++. – Alex S. Jul 07 '20 at 16:14
  • Noted, not interested in a software solution, but if that's how you write your programs then more power to you. – Blindy Jul 07 '20 at 16:14
  • @AlexanderSadovskyi that function is a C function, what do you mean? – Blindy Jul 07 '20 at 16:14
  • If OP has no access to anything machine-specific intrinsics or anything else beyond portable C, which seems to be the case, then I don't see what else can be done. – Nate Eldredge Jul 07 '20 at 16:16
  • Sorry, It is actually a C function. Unfortunately I don't target Windows or any specific hardware/OS. – Alex S. Jul 07 '20 at 16:16