25

Suppose we have 2 constants A & B and a variable i, all 64 bits integers. And we want to compute a simple common arithmetic operation such as:

i * A / B    (1)

To simplify the problem, let's assume that variable i is always in the range [INT64_MIN*B/A, INT64_MAX*B/A], so that the final result of the arithmetic operation (1) does not overflow (i.e.: fits in the range [INT64_MIN, INT64_MAX]).

In addition, i is assumed to be more likely in the friendly range Range1 = [INT64_MIN/A, INT64_MAX/A] (i.e.: close to 0), however i may be (less likely) outside this range. In the first case, a trivial integer computation of i * A would not overflow (that's why we called the range friendly); and in the latter case, a trivial integer computation of i * A would overflow, leading to an erroneous result in computation of (1).

What would be the "safest" and "most efficient" way to compute operation (1) (where "safest" means: preserving exactness or at least a decent precision, and where "most efficient" means: lowest average computation time), provided i is more likely in the friendly range Range1.

At now, the solution currently implemented in the code is the following one :

(int64_t)((double)A / B * i)

which solution is quite safe (no overflow) though inaccurate (precision loss due to double significand 53 bits limitation) and quite fast because double division (double)A / B is precomputed at compile time, letting only a double multiplication to be computed at runtime.

shrike
  • 4,449
  • 2
  • 22
  • 38
  • you can detect a lot of cases by checking the results with double versions, but I don't think it is going to catch every case because you only have 53 significant bits in a 64 bit double... – Grady Player Apr 24 '16 at 17:44
  • `(int64_t)((double)i * A / B)` . You may lose precision as you're dealing with very large value as double is very likely 64bits in your system – sjsam Apr 24 '16 at 17:46
  • 2
    My first thought is to resort to assembly and check all those CPU flags. My second thought is I don't want to do so. –  Apr 24 '16 at 18:01
  • Easy solutions would be to use a 128-bit integer type (if available on your platform) or to first divide by B and then multiply by A (depending on the common values for A and B this might or might not lose precision). Why are these options not feasable? – Ctx Apr 24 '16 at 18:07
  • 1
    _safest_ and _most efficient_ may be mutually exclusive. (similar to compiler optimizations for speed _or_ for size) – ryyker Apr 24 '16 at 18:08
  • In the proposed approach at the bottom, I don't think you can count on `i * A` saturating in the case of overflow (i.e., producing `INT64_MIN` or `INT64_MAX`). You would have to actually test the result to know if it overflowed. – Tom Karzes Apr 24 '16 at 18:13
  • @Ctx: dividing by B first is not acceptable because of precision loss in the most likely situation – shrike Apr 24 '16 at 18:27
  • @Tom: this may not be portable, however, compiling with MSVC for x64 target, on int64_t multiplication overflow (either positive or negative) result is INT64_MIN – shrike Apr 24 '16 at 18:29
  • 1
    You can at least test for potential overflow first (even if it doesn't solve the actual question) by making a trial division. For example with positive values, `if(INT64_MAX / i >= A` you can safely multiply `i * A` without an overflow. If not, you'll have to have a "longhand" alternative using 128 bits if that size is not available. – Weather Vane Apr 24 '16 at 18:29
  • Is this signed or unsigned multiplication? – Stephen Apr 24 '16 at 18:37
  • @Stephen: everything shown is signed. – wallyk Apr 24 '16 at 19:02
  • You could substitute `i = x * B + y` (with `x` and `y` results of division and modulo) into your operation and compute its result through `x` and `y`. Whether it will be overflow free basically depends on whether `A * B` fits into 64-bit integer or not, so it may or may not be suitable for you. Also, no idea how efficient that is. –  Apr 24 '16 at 21:51
  • If A and B are known at compile time then the compiler would first calculate the A / B into a new constant value (say C) and in run-time you only multiple i by C – Uri Brecher Apr 25 '16 at 04:32
  • 1
    If `i` in range `[INT64_MIN/A, INT64_MAX/A]`, perform `i*A/B`. Else resort to 128-bit integer math. – chux - Reinstate Monica Apr 25 '16 at 06:22

5 Answers5

6

If you cannot get better bounds on the ranges involved then you're best off following iammilind's advice to use __int128.

The reason is that otherwise you would have to implement the full logic of word to double-word multiplication and double-word by word division. The Intel and AMD processor manuals contain helpful information and ready-made code, but it gets quite involved, and using C/C++ instead of in assembler makes things doubly complicated.

All good compilers expose useful primitives as intrinsics. Microsoft's list doesn't seem to include a muldiv-like primitive but the __mul128 intrinsic gives you the two halves of the 128-bit product as two 64-bit integers. Based on that you can perform long division of two digits by one digit, where one 'digit' would be a 64-bit integer (usually called 'limb' because bigger than a digit but still only part of the whole). Still quite involved but lots better than using pure C/C++. However, portability-wise it is no better than using __int128 directly. At least that way the compiler implementers have already done all the hard work for you.

If your application domain can give you useful bounds, like that (u % d) * v will not overflow then you can use the identity

(u * v) / d = (u / d) * v + ((u % d) * v) / d

where / signifies integer division, as long as u is non-negative and d is positive (otherwise you might run afoul of the leeway allowed for the semantics of operator %).

In any case you may have to separate out the signs of the operands and use unsigned operations in order to find more useful mechanisms that you can exploit - or to circumvent sabotage by the compiler, like the saturating multiplication that you mentioned. Overflow of signed integer operations invokes undefined behaviour, compilers are free to do whatever they please. By contrast, overflow for unsigned types is well-defined.

Also, with unsigned types you can fall back on rules like that with s = a (+) b (where (+) is possibly-overflowing unsigned addition) you will have either s == a + b or s < a && s < b, which lets you detect overflow after the fact with cheap operations.

However, it is unlikely that you will get much farther on this road because the effort required quickly approaches - or even exceeds - the effort of implementing the double-limb operations I alluded to earlier. Only a thorough analysis of the application domain could give the information required for planning/deploying such shortcuts. In the general case and with the bounds you have given you're pretty much out of luck.

Community
  • 1
  • 1
DarthGizka
  • 4,347
  • 1
  • 24
  • 36
  • 1
    Signed integer overflow is implementation-defined, not undefined behavior [conv.integral]. It cannot go and kill your grandma without telling you so beforehand. – Roland W Apr 26 '16 at 18:06
  • @Roland: Thanks for the clarification. Implementation-defined is lots better than undefined but could it be that this is an achievement of more recent iterations of the standards? I do remember vividly the stringent warnings about signed integer overflow when I was a student, and one of may colleagues swears blind that once upon a time a signed integer overflow with a TI (DSP) compiler beat him up and absconded with his single-malt... – DarthGizka Apr 26 '16 at 18:22
  • 1
    @DarthGizka: Thanks for this detailed answer. A lot of the solutions I tested are based on your propositions, especially: `(u * v) / d = (u / d) * v + ((u % d) * v) / d` (which I did not know) giving an exact result when `i` is outside the *safe* range. Thanks a lot. – shrike Apr 27 '16 at 14:29
  • 1
    @Roland: the Wikipedia [still has signed overflow as undefined behaviour](https://en.wikipedia.org/wiki/Undefined_behavior#Benefits_from_undefined_behavior), which supports my guess that the downscaling from undefined to unspecified must be comparatively recent. – DarthGizka Apr 28 '16 at 14:45
  • 2
    Still 'undefined' as of C++11, see [Is signed integer overflow still undefined behavior in C++?](http://stackoverflow.com/questions/16188263/is-signed-integer-overflow-still-undefined-behavior-in-c) – DarthGizka Apr 28 '16 at 14:54
  • 1
    @DarthGizka Okay, so there is a difference between how expression evaluation and conversion are treated. A conversion of a value that overflows (cannot be represented in the target type) is implementation-defined [[conv.integral](http://eel.is/c++draft/conv.integral)] while overflow during expression evaluation is undefined [[expr](http://eel.is/c++draft/expr#4)]. Thanks for pointing that out. – Roland W May 10 '16 at 09:01
3

In order to provide a quantified answer to the question, I made a benchmark of different solutions as part of the ones proposed here in this post (thanks to comments and answers).

The benchmark measures computation time of different implementations, when i is inside the friendly range Range1 = [INT64_MIN/A, INT64_MAX/A], and when i is outside the friendly range (yet in the safe range Range2 = [INT64_MIN*B/A, INT64_MAX*B/A]).

Each implementation performs a "safe" (i.e. with no overflow) computation of the operation: i * A / B (except the 1st implementation, given as reference computation time). However, some implementations may return infrequent inaccurate computation result (which behavior is notified).

Some solutions proposed have not been tested or are not listed hereafter; these are: solution using __int128 (unsupported by ms vc compiler), yet boost int128_t has been used instead; solutions using extended 80 bits long double (unsupported by ms vc compiler); solution using InfInt (working and tested though too slow to be a decent competitor).

Time measurements are specified in ps/op (picoseconds per operation). Benchmark platform is an Intel Q6600@3GHz under Windows 7 x64, executable compiled with MS vc14, x64/Release target. The variables, constants and function referenced hereafter are defined as:

int64_t       i;
const int64_t A     = 1234567891;
const int64_t B     = 4321987;
inline bool   in_safe_range(int64_t i) { return (INT64_MIN/A <= i) && (i <= INT64_MAX/A); }
  1. (i * A / B) [reference]
    i in Range1: 1469 ps/op, i outside Range1: irrelevant (overflows)
  2. ((int64_t)((double)i * A / B))
    i in Range1: 10613 ps/op, i outside Range1: 10606 ps/op
    Note: infrequent inaccurate result (max error = 1 bit) in the whole range Range2
  3. ((int64_t)((double)A / B * i))
    i in Range1: 1073 ps/op, i outside Range1: 1071 ps/op
    Note: infrequent inaccurate result (max error = 1 bit) in the whole range Range2
    Note: compiler likely precomputes (double)A / B resulting in the observed performance boost vs previous solution.
  4. (!in_safe_range(i) ? (int64_t)((double)A / B * i) : (i * A / B))
    i in Range1: 2009 ps/op, i outside Range1: 1606 ps/op
    Note: rare inaccurate result (max error = 1 bit) outside Range1
  5. ((int64_t)((int128_t)i * A / B)) [boost int128_t]
    i in Range1: 89924 ps/op, i outside Range1: 89289 ps/op
    Note: boost int128_t performs dramatically bad on the bench platform (have no idea why)
  6. ((i / B) * A + ((i % B) * A) / B)
    i in Range1: 5876 ps/op, i outside Range1: 5879 ps/op
  7. (!in_safe_range(i) ? ((i / B) * A + ((i % B) * A) / B) : (i * A / B))
    i in Range1: 1999 ps/op, i outside Range1: 6135 ps/op

Conclusion
a) If slight computation errors are acceptable in the whole range Range2, then solution (3) is the fastest one, even faster than the direct integer computation given as reference.
b) If computation errors are unacceptable in the friendly range Range1, yet acceptable outside this range, then solution (4) is the fastest one.
c) If computation errors are unacceptable in the whole range Range2, then solution (7) performs as well as solution (4) in the friendly range Range1, and remains decently fast outside this range.

shrike
  • 4,449
  • 2
  • 22
  • 38
1

I think you can detect the overflow before it happens. In your case of i * A / B, you are only worried about the i * A part because the division cannot overflow.

You can detect the overflow by performing test of bool overflow = i > INT64_MAX / A. You will have to do modify this depending on the sign of operands and result.

wilx
  • 17,697
  • 6
  • 59
  • 114
  • Detecting overflow before it occurs is actualy the best thing to do, thanks for pointing this out. I did include this check in solutions I tested and the best ones do use it. – shrike Apr 27 '16 at 14:21
1

Some implementations permit __int128_t. Check if your implementation allows it, so that you can you may use it as placeholder instead of double. Refer below post:
Why isn't there int128_t?


If not very concerned about "fast"-ness, then for good portability I would suggest to use header only C++ library "InfInt".

It is pretty straight forward to use the library. Just create an instance of InfInt class and start using it:

InfInt myint1 = "15432154865413186646848435184100510168404641560358"; 
InfInt myint2 = 156341300544608LL;

myint1 *= --myint2 - 3;
std::cout << myint1 << std::endl;
Community
  • 1
  • 1
iammilind
  • 68,093
  • 33
  • 169
  • 336
  • unfortunately, MS vc does not support __int128; it is defined somewhere in a header but using it leads to compile error: "__int128 not supported on this platform"; I used boost int128_t instead. I tried InfInt but performance was very bad, so I dropped it from the competitors. Thanks for these solutions, anyway. – shrike Apr 27 '16 at 14:25
0

Not sure about value bounds, will (i / B) * A + (i % B) * A / B help?

bipll
  • 11,747
  • 1
  • 18
  • 32