7

Assuming that uint is the largest integral type on my fixed-point platform, I have:

uint func(uint a, uint b, uint c);

Which needs to return a good approximation of a * b / c.

The value of c is greater than both the value of a and the value of b.

So we know for sure that the value of a * b / c would fit in a uint.

However, the value of a * b itself overflows the size of a uint.

So one way to compute the value of a * b / c would be:

return a / c * b;

Or even:

if (a > b)
    return a / c * b;
return b / c * a;

However, the value of c is greater than both the value of a and the value of b.

So the suggestion above would simply return zero.

I need to reduce a * b and c proportionally, but again - the problem is that a * b overflows.

Ideally, I would be able to:

  • Replace a * b with uint(-1)
  • Replace c with uint(-1) / a / b * c.

But no matter how I order the expression uint(-1) / a / b * c, I encounter a problem:

  • uint(-1) / a / b * c is truncated to zero because of uint(-1) / a / b
  • uint(-1) / a * c / b overflows because of uint(-1) / a * c
  • uint(-1) * c / a / b overflows because of uint(-1) * c

How can I tackle this scenario in order to find a good approximation of a * b / c?


Edit 1

I do not have things such as _umul128 on my platform, when the largest integral type is uint64. My largest type is uint, and I have no support for anything larger than that (neither on the HW level, nor in some pre-existing standard library).

My largest type is uint.

Edit 2

In response to numerous duplicate suggestions and comments:

I do not have some "larger type" at hand, which I can use for solving this problem. That is why the opening statement of the question is:

Assuming that uint is the largest integral type on my fixed-point platform

I am assuming that no other type exists, neither on the SW layer (via some built-in standard library) nor on the HW layer.

halfer
  • 19,824
  • 17
  • 99
  • 186
goodvibration
  • 5,980
  • 4
  • 28
  • 61
  • Well, either `a` or `b` must be larger than "half the size of `uint`", so maybe I should replace the larger one of them, for example, `a` with `uint(-1) / a`. Then, I can fix `c` proportionally... just a thought... – goodvibration Oct 28 '20 at 08:00
  • @WeatherVane: It's a fixed-point infrastructure. – goodvibration Oct 28 '20 at 08:00
  • Please put that in the question! – Weather Vane Oct 28 '20 at 08:01
  • 1
    @GSerg: No, it doesn't, thanks. – goodvibration Oct 28 '20 at 08:02
  • Performance requirements versus precision requirements? – Support Ukraine Oct 28 '20 at 08:17
  • 1
    @4386427: Precision favored, thanks. – goodvibration Oct 28 '20 at 08:23
  • [How can I descale x by n/d, when x*n overflows?](https://stackoverflow.com/q/63091924/995714), [How can I multiply and divide 64-bit ints accurately?](https://stackoverflow.com/q/18022544/995714), [How to multiply a 64 bit integer by a fraction in C++ while minimizing error?](https://stackoverflow.com/q/25182577/995714), [(a * b) / c MulDiv and dealing with overflow from intermediate multiplication](https://stackoverflow.com/q/54232987/995714) – phuclv Oct 28 '20 at 09:51
  • Question would benefit with an update with a more explicit restriction barring FP if it is so. Even a "my fixed-point platform" could implement `double`. – chux - Reinstate Monica Oct 28 '20 at 10:58
  • What is "fixed-point infrastructure" ? Are you trying to say that there's no floating point? – M.M Oct 28 '20 at 11:00
  • @M.M: Yes, exactly. – goodvibration Oct 28 '20 at 11:15
  • does it have to be in C? or C++ is allowed ? – vmp Oct 28 '20 at 11:45
  • 1
    @vmp: How would C++ make any difference here??? It's a purely-arithmetic problem. In any case, my issue is not even in C, it's in Solidity. The only reason I posted it C is because there's a larger audience for it than there is for Solidity, while both languages share the same common nature of supporting integer-division natively (i.e., by definition). – goodvibration Oct 28 '20 at 12:24
  • 1
    Does this answer your question? [Fast method to multiply integer by proper fraction without floats or overflow](https://stackoverflow.com/questions/57300788/fast-method-to-multiply-integer-by-proper-fraction-without-floats-or-overflow) – phuclv Oct 28 '20 at 13:16
  • Well, in this case wouldn't the size of the integer and and value of `c` be compile time constants? Why don't you tell us what they are? Knowing them ahead of time simplifies the problem a lot. – KevinZ Nov 01 '20 at 20:08
  • @phuclv: No, it doesn't. I've posted an answer to my own question. Thanks. – goodvibration Nov 01 '20 at 20:14
  • @KevinZ: None of the input (`a`, `b` and `c`) is of constant values. – goodvibration Nov 01 '20 at 20:15
  • @goodvibration I asked because most of the when people ask for fixed point arithmetic, they are trying to implement scaled decimals. In that situation, even if `c` isn't a single compile time constant, it must be within a very finite set of compile time constants, all of which can be hard-coded for. Division against known divisor is also a lot faster than division against unknown divisor. That said, there is perhaps an alternative way of simplifying this if `c` cannot be constant: if you can promise that `c < sqrt(UINT_MAX)`, then there is another shortcut that is possible. – KevinZ Nov 01 '20 at 20:37
  • @KevinZ: It (`c`) is not constant. Moreover, I've posted a restricted situation which I was trying to handle to begin with (`a < c && b < c`), while in fact any scenario is possible on my system. I've already developed a constant-time solution (meaning no loops), which you can see below. Thanks. – goodvibration Nov 01 '20 at 20:48

4 Answers4

2

needs to return a good approximation of a * b / c
My largest type is uint
both a and b are smaller than c

Variation on this 32-bit problem:

Algorithm: Scale a, b to not overflow

SQRT_MAX_P1 as a compile time constant of sqrt(uint_MAX + 1)
sh = 0;
if (c >= SQRT_MAX_P1) {
  while (|a| >= SQRT_MAX_P1) a/=2, sh++
  while (|b| >= SQRT_MAX_P1) b/=2, sh++
  while (|c| >= SQRT_MAX_P1) c/=2, sh--
}
result = a*b/c

shift result by sh.

With an n-bit uint, I expect the result to be correct to at least about n/2 significant digits.

Could improve things by taking advantage of the smaller of a,b being less than SQRT_MAX_P1. More on that later if interested.


Example

#include <inttypes.h>

#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))
// https://stackoverflow.com/a/4589384/2410359

#define UINTMAX_WIDTH (IMAX_BITS(UINTMAX_MAX))
#define SQRT_UINTMAX_P1 (((uintmax_t)1ull) << (UINTMAX_WIDTH/2))

uintmax_t muldiv_about(uintmax_t a, uintmax_t b, uintmax_t c) {
  int shift = 0;
  if (c > SQRT_UINTMAX_P1) {
    while (a >= SQRT_UINTMAX_P1) {
      a /= 2; shift++;
    }
    while (b >= SQRT_UINTMAX_P1) {
      b /= 2; shift++;
    }
    while (c >= SQRT_UINTMAX_P1) {
      c /= 2; shift--;
    }
  }
  uintmax_t r = a * b / c;
  if (shift > 0) r <<= shift;
  if (shift < 0) r >>= shift;
  return r;
}



#include <stdio.h>

int main() {
  uintmax_t a = 12345678;
  uintmax_t b = 4235266395;
  uintmax_t c = 4235266396;
  uintmax_t r = muldiv_about(a,b,c);
  printf("%ju\n", r);
}

Output with 32-bit math (Precise answer is 12345677)

12345600  

Output with 64-bit math

12345677  
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

Here is another approach that uses recursion and minimal approximation to achieve high precision.

First the code and below an explanation.

Code:

uint32_t bp(uint32_t a) {
  uint32_t b = 0;
  while (a!=0)
  {
    ++b;
    a >>= 1;
  };
  return b;
}

int mul_no_ovf(uint32_t a, uint32_t b)
{
  return ((bp(a) + bp(b)) <= 32);
}

uint32_t f(uint32_t a, uint32_t b, uint32_t c)
{
  if (mul_no_ovf(a, b))
  {
    return (a*b) / c;
  }

  uint32_t m = c / b;
  ++m;
  uint32_t x = m*b - c;
  // So m * b == c + x where x < b and m >= 2

  uint32_t n = a/m;
  uint32_t r = a % m;
  // So a*b == n * (c + x) + r*b == n*c + n*x + r*b where r*b < c

  // Approximation: get rid of the r*b part
  uint32_t res = n;
  if (r*b > c/2) ++res;

  return res + f(n, x, c);
}

Explanation:

The multiplication a * b can be written as a sum of b

a * b = b + b + .... + b

Since b < c we can take a number m of these b so that (m-1)*b < c <= m*b, like

(b + b + ... + b) + (b + b + ... + b) + .... + b + b + b
\---------------/   \---------------/ +        \-------/
       m*b        +        m*b        + .... +     r*b
     \-------------------------------------/
            n times m*b

so we have

a*b = n*m*b + r*b

where r*b < c and m*b > c. Consequently, m*b is equal to c + x, so we have

a*b = n*(c + x) + r*b = n*c + n*x + r*b

Divide by c :

a*b/c = (n*c + n*x + r*b)/c = n + n*x/c + r*b/c

The values m, n, x, r can all be calculated from a, b and c without any loss of 
precision using integer division (/) and remainder (%).

The approximation is to look at r*b (which is less than c) and "add zero" when r*b<=c/2
and "add one" when r*b>c/2.

So now there are two possibilities:

1) a*b = n + n*x/c

2) a*b = (n + 1) + n*x/c

So the problem (i.e. calculating a*b/c) has been changed to the form

MULDIV(a1,b1,c) = NUMBER + MULDIV(a2,b2,c)

where a2,b2 is less than a1,b2. Consequently, recursion can be used until 
a2*b2 no longer overflows (and the calculation can be done directly).
Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • Thank you. I've established a solution which work in `O(1)` complexity (no loops). Please see my answer below (or above). – goodvibration Oct 29 '20 at 05:38
0

I've established a solution which work in O(1) complexity (no loops):

typedef unsigned long long uint;

typedef struct
{
    uint n;
    uint d;
}
fraction;

uint func(uint a, uint b, uint c);
fraction reducedRatio(uint n, uint d, uint max);
fraction normalizedRatio(uint a, uint b, uint scale);
fraction accurateRatio(uint a, uint b, uint scale);
fraction toFraction(uint n, uint d);
uint roundDiv(uint n, uint d);

uint func(uint a, uint b, uint c)
{
    uint hi = a > b ? a : b;
    uint lo = a < b ? a : b;
    fraction f = reducedRatio(hi, c, (uint)(-1) / lo);
    return f.n * lo / f.d;
}

fraction reducedRatio(uint n, uint d, uint max)
{
    fraction f = toFraction(n, d);
    if (n > max || d > max)
        f = normalizedRatio(n, d, max);
    if (f.n != f.d)
        return f;
    return toFraction(1, 1);
}

fraction normalizedRatio(uint a, uint b, uint scale)
{
    if (a <= b)
        return accurateRatio(a, b, scale);
    fraction f = accurateRatio(b, a, scale);
    return toFraction(f.d, f.n);
}

fraction accurateRatio(uint a, uint b, uint scale)
{
    uint maxVal = (uint)(-1) / scale;
    if (a > maxVal)
    {
        uint c = a / (maxVal + 1) + 1;
        a /= c; // we can now safely compute `a * scale`
        b /= c;
    }
    if (a != b)
    {
        uint n = a * scale;
        uint d = a + b; // can overflow
        if (d >= a) // no overflow in `a + b`
        {
            uint x = roundDiv(n, d); // we can now safely compute `scale - x`
            uint y = scale - x;
            return toFraction(x, y);
        }
        if (n < b - (b - a) / 2)
        {
            return toFraction(0, scale); // `a * scale < (a + b) / 2 < MAXUINT256 < a + b`
        }
        return toFraction(1, scale - 1); // `(a + b) / 2 < a * scale < MAXUINT256 < a + b`
    }
    return toFraction(scale / 2, scale / 2); // allow reduction to `(1, 1)` in the calling function
}

fraction toFraction(uint n, uint d)
{
    fraction f = {n, d};
    return f;
}

uint roundDiv(uint n, uint d)
{
    return n / d + n % d / (d - d / 2);
}

Here is my test:

#include <stdio.h>

int main()
{
    uint a = (uint)(-1) / 3;            // 0x5555555555555555
    uint b = (uint)(-1) / 2;            // 0x7fffffffffffffff
    uint c = (uint)(-1) / 1;            // 0xffffffffffffffff
    printf("0x%llx", func(a, b, c));    // 0x2aaaaaaaaaaaaaaa
    return 0;
}
goodvibration
  • 5,980
  • 4
  • 28
  • 61
  • 2
    I kind of expected this... Do you recall I asked you about "precision versus performance" and your answer was "Precision favored, thanks". And now you post an O(1) solution with "poor" precision. So it seems you actually wanted performance over precision ;-) – Support Ukraine Oct 29 '20 at 08:43
  • @4386427: Yes, but this also achieves precision, so... – goodvibration Oct 29 '20 at 08:59
  • Precision... well, "sufficient precision" depends on your application (which is why I asked) and this may be the precision you need. Then it's fine :-) Here is just one "random" selected example based on 32-bit unsigned: `a: 12345678 b: 4235266395 c: 4292973296. Correct answer: 12179725 Recursive answer: 12179725 O(1) answer: 12134037` So the O(1) approach is off by ~45000 or ~0.4% Which may be sufficiently good for your app – Support Ukraine Oct 29 '20 at 09:08
  • @4386427: Yeah, you're right. When you initially asked the question about 'performance vs precision', I assumed that any worst-case performance solution would still be `O(1)` (i.e., several operations for sure, but not depending on the length of the input). So I immediately responded with 'precision preferred over performance'. But then I received multiple "while" answers, which I am unfortunately unable to really allow in my system. Thank you! – goodvibration Oct 29 '20 at 10:25
  • Quote: " I assumed that any worst-case performance solution would still be O(1)" Well, if we want to be strict about it, we can't really talk about big-O complexity here as there isn't anything here that can grow infinite. The upper-limit is the number of bits in your unsigned type so there is an upper limit for execution time which makes the algo O(1) despite the use of loops/recursions. As an example: In my recursive algo the value `a` is at least divide by 2 between recursive calls. Consequently, you can't get more recursive calls than the number of bits. So in big-O that would be O(1). – Support Ukraine Oct 29 '20 at 14:13
  • @4386427: Yes, but recursion has a lot of other (non-algorithmic) implications, and even though not dictated by the C language standard, I would want to avoid using recursion as part of my solution. It is typically the case in low-resource platforms, such as RT/embedded, as well as in platforms were compiler optimization (specifically loop unrolling) can be applied. My platform is not even C, it's Solidity, which in a sense is similar to what I've just described. – goodvibration Oct 29 '20 at 14:40
  • And just in case you're wondering now, the only reason I posted this question in C is because there's a larger audience for it than there is for Solidity, while both languages share the same common nature of supporting integer-division natively (i.e., by definition). – goodvibration Oct 29 '20 at 14:40
-1

You can cancel prime factors as follows:

uint gcd(uint a, uint b) 
{
    uint c;
    while (b) 
    {
        a %= b;
        c = a;
        a = b;
        b = c;
    }
    return a;
}


uint func(uint a, uint b, uint c)
{
    uint temp = gcd(a, c);
    a = a/temp;
    c = c/temp;

    temp = gcd(b, c);
    b = b/temp;
    c = c/temp;

    // Since you are sure the result will fit in the variable, you can simply
    // return the expression you wanted after having those terms canceled.
    return a * b / c;
}
vmp
  • 2,370
  • 1
  • 13
  • 17