2

I have a C function:

int64_t fn(int64_t a, int32_t b, int32_t c, int32_t d)
{
    /* should return (a * b * c)/d */   
}

It is possible for a to be near INT64_MAX, but for the final result not to overflow, for instance if b = 1, c = d = 40. However, I am having trouble figuring out how to compute this so that I never lose data to rounding (by doing the division first) or have an intermediate result overflow.

If I had access to a large enough datatype to fit the whole product of a, b, and c, I would just do the math in that type and then truncate, but is there some way I can do this without big integers?

4 Answers4

5

Write a = q*d + r with |r| < |d| (I'm assuming d != 0, otherwise the computation is meaningless anyway). Then (a*b*c)/d = q*b*c + (r*b*c)/d. If q*b*c overflows, the entire computation would overflow anyway, so either you don't care, or you have to check for overflow. r*b*c might still overflow, so we again use the same method to avoid overflow,

int64_t q = a/d, r = a%d;
int64_t part1 = q*b*c;
int64_t q1 = (r*b)/d, r1 = (r*b)%d;
return part1 + q1*c + (r1*c)/d;
Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • If you take `a==INT64_MAX-1`, `b==INT64_MAX`, `c==1`, and `d==INT64_MAX` then `a*b*c/d==INT64_MAX-1` while your procedure overflows on `r*b` since `r==INT64_MAX-1`. It is pretty easy to do `r*b` in 128 bits, even manually using two 64 bits variables - just split the numbers in high and low 32 bits parts and then use the `(rH+rL)*(bH+bL)`. However, later you still need to do the division and modulo by `d` of the 128 bits number, so in the end the problem remains... – Adam Badura Feb 09 '22 at 15:36
1

It's easy to see that some inputs would produce outputs that cannot be represented by the return type int64_t. For example, fn(INT64_MAX, 2, 1, 1). However, the following approach will should allow you to return a correct answer for any combination of inputs that does in fact fit within the range of int64_t.

int64_t fn(int64_t a, int32_t b, int32_t c, int32_t d)
{
    /* find the integer and remainder portions of a/d */
    int64_t leftI = a / d;
    int64_t leftR = a % d;

    /* multiply the integer portion of the result by b and c */
    int64_t resultI = leftI * b * c;

    /* multiply the remainder portion by b */
    int64_t resultR = leftR * b;
    resultI = resultI + (resultR / d) * c;

    /* multiply the remainder portion by c */
    resultR = (resultR % d) * c;

    return resultI + (resultR / d);
}
Sam Harwell
  • 97,721
  • 20
  • 209
  • 280
  • This is elegant, but doesn't seem to work out for the case (23, 40, 5, 5). Any ideas why? – user2521200 Aug 02 '13 at 18:12
  • @user2521200 I forgot a `*c` in there. Updated the answer to correct it. – Sam Harwell Aug 02 '13 at 20:36
  • Take a=9999, b= 0.96*MaxInt, c=1, d = MaxInt, and this will overflow at resultR=leftR*b. I was trying to use it to scale a 64-bit random into a smaller range and it won't work. It tries to multiply 9999 by ~MaxInt which obviously fails. – MichaelS Mar 10 '22 at 02:09
1

I would suggest finding the greatest common divisor of d with each of a, b, and c, dividing out the common factors as you go:

common = gcd(a,d) // You can implement GCD using Euclid's algorithm

a=a/common
d=d/common

common = gcd(b,d)
b=b/common
d=d/common

common = gcd(c,d)
c=c/common
d=d/common

Then calculate a*b*c/d with all the common factors removed. Euclid's GCD algorithm runs in logarithmic time, so this should be fairly efficient.

Jim Lewis
  • 43,505
  • 7
  • 82
  • 96
  • This is nice mathematically, but does not help if OP also wants results where the result of the division would be dropping a remainder. In terms of the "integer division operator" /, a*b*c/d has an "exact" answer even if a,b,c,d are all relatively prime, and this answer might fit in the resulting type even when the intermediary operations overflow. – R.. GitHub STOP HELPING ICE Aug 02 '13 at 18:13
0

If you are working on x86_64 then the asm supports 128 bit integers:

int64_t fn(uint64_t a, uint64_t b, uint64_t c, uint64_t d) {

    asm (
        "mulq %1\n"          // a *= b
        "movq %%rbx, %%rdx\n"// rbx = upper 64 bit of the multiplication
        "mulq %2\n"          // multiply the lower 64 bits by c
        "push %%rax\n"       // temporarily save the lowest 64 bits on the stack
        "mov %%rcx, %%rdx\n" // rcx = upper 64 bits of the multiplication
        "movq %%rax, %%rbx\n"// 
        "mulq %2\n"          // multiply the upper 64 bits by c
        "addq %%rax, %%rcx\n"// combine the middle 64 bits
        "addcq %%rdx, $0\n"  // transfer carry tp the higest 64 bits if present
        "divq %3\n"          // divide the upper 128 (of 192) bits by d
        "mov %%rbx, %%rax\n" // rbx = result
        "pop %%rax\n"
        "divq %3\n"          // divide remainder:lower 64 bits by d
        : "+a" (a)           // assigns a to rax register as in/out
        , "+b" (b)           // assigns b to rbx register
        : "g" (c)            // assigns c to random register
        , "g" (d)            // assigns d to random register
        : "edx", "rdx"       // tells the compiler that edx/rdx will be used internally, but does not need any input
    );

    // b now holds the upper 64 bit if (a * b * c / d) > UINT64_MAX
    return a;
}

Please note that all input integers have to be the same length. Working length will be double the input. Works with unsigned only.

The native div and mul instructions on x86 work on double-length exactly to allow for overflows. Sadly I am unaware of a compiler intrinsic to make use of them.

Sergey L.
  • 21,822
  • 5
  • 49
  • 75
  • The first `mulq` will produce a 128-bit output, but the second `mulq` cannot use the upper 64 bits as part of its input, as far as I know; even if it could, you could still overflow since a*b*c could take up to 192 bits. Also, the `divq` instruction will fault if the upper 64 bits are greater than or equal to the divisor. – R.. GitHub STOP HELPING ICE Aug 02 '13 at 18:15
  • @R.. Yeah, I was hoping to avoid those problems, but you are right. I posted a fixed and extended version, but now it's more complicated then I like it myself. – Sergey L. Aug 02 '13 at 18:56
  • It would be simpler if you defined a macro or function to do the 64x64->128 multiplication (via asm) with output in an array, and likewise for the division, then wrote the actual logic in C. – R.. GitHub STOP HELPING ICE Aug 02 '13 at 21:01