C++ int64 * double == off by one

Question

Below is the code I've tested in a 64-bit environment and 32-bit. The result is off by one precisely each time. The expected result is: 1180000000 with the actual result being 1179999999. I'm not sure exactly why and I was hoping someone could educate me:

#include <stdint.h>
#include <iostream>

using namespace std;

int main() {
  double odds = 1.18;
  int64_t st = 1000000000;
  int64_t res = st * odds;
  cout << "result: " << res << endl;
  return 1;
}

I appreciate any feedback.

You are going to get this sort of problem with floating point. `double` is double-precision, not infinite-precision! Try `cout << setprecision(30) << st * odds << '\n'` to get more idea of what is going on. — M.M, Dec 21 '15 at 03:33
Classic educational material: [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) — Ivan Aksamentov - Drop, Dec 21 '15 at 03:37
M.M this is with setprecision() used: without precision: 1179999999 with precision: 1180000000 — jweaver, Dec 21 '15 at 03:37
Why don't you post the result you get? Maybe we can help you count zeros :) — smac89, Dec 21 '15 at 03:39
I posted the results above Smac89 :) Note, when I wrap the math in abs(), the proper result returns: int64_t res = abs(st * odds); The above yields: without precision: 1180000000 with precision: 1180000000 — jweaver, Dec 21 '15 at 03:41
Also see previous [`1.18` question](http://stackoverflow.com/questions/34187562/issue-with-floating-points-representation) and the pointers within. — dxiv, Dec 21 '15 at 03:44
@IInspectable - the expected result is 1180000000 and the actual result is: 1179999999 — jweaver, Dec 21 '15 at 03:58

roeland · Answer 1 · 2015-12-21T04:23:02.567

7

1.18, or 118 / 100 can't be exactly represented in binary, it will have repeating decimals. The same happens if you write 1 / 3 in decimal.

So let's go over a similar case in decimal, let's calculate (1 / 3) × 30000, which of course should be 10000:

odds = 1 / 3 and st = 30000

Since computers have only a limited precision we have to truncate this number to a limited number of decimals, let's say 6, so:
odds = 0.333333
0.333333 × 10000 = 9999.99. The cast (which in your program is implicit) will truncate this number to 9999.

There is no 100% reliable way to work around this. float and double just have only limited precision. Dealing with this is a hard problem.

Your program contains an implicit cast from double to an integer on the line int64_t res = st * odds;. Many compilers will warn you about this. It can be the source of bugs of the type you are describing. This cast, which can be explicitly written as (int64_t) some_double, rounds the number towards zero.

An alternative is rounding to the nearest integer with round(some_double);. That will—in this case—give the expected result.

edited Dec 21 '15 at 04:23

answered Dec 21 '15 at 04:10

roeland

5,349
2
14
28

Your last para isn't correct. The behaviour of `int64_t res = st * odds;` and `int64_t res = (int64_t)(st * odds);` is defined to be exactly the same - in both cases the expression `st * odds` is converted to `int64_t`, and the behaviour of this conversion is *implementation-defined* if the result of the multiplication is not exactly representible in `int64_t`. – M.M Dec 21 '15 at 04:14
"You should not implicitely cast double to an integer" is a matter of opinion - mine is that using a cast to hide a warning is not ideal either – M.M Dec 21 '15 at 04:15
@M.M yes, but sometimes you just need to round a double to an integer. I have the habit of always explicitly using a function like `floor` or `round`. – roeland Dec 21 '15 at 04:18
The `round` one is a good habit. There are some issues with `floor(x+0.5)` (it doesn't work for negative numbers, and it starts to behave strangely when you get to a certain magnitude of number) – M.M Dec 21 '15 at 04:19
@M.M Good point about large numbers. But for negative numbers `floor(-1.9999 + 0.5)` returns `-2.0` as expected, or doesn't it? – roeland Dec 21 '15 at 04:27
Thanks everyone, I'll try with round() instead and post the results – jweaver Dec 21 '15 at 04:30
I see → [Harder than it looks: rounding float to nearest integer, part 1](http://blog.frama-c.com/index.php?post/2013/05/02/nearbyintf1) – roeland Dec 21 '15 at 04:31

score 2 · Answer 2 · answered Dec 21 '15 at 04:40

First of all - 1.18 is not exactly representable in double. Mathematically the result of:

double odds = 1.18;

is 1.17999999999999993782751062099 (according to an online calculator).

So, mathematically, odds * st is 1179999999.99999993782751062099.

But in C++, odds * st is an expression with type double. So your compiler has two options for implementing this:

Do the computation in double precision
Do the computation in higher precision and then round the result to double

Apparently, doing the computation in double precision in IEEE754 results in exactly 1180000000.

However, doing it in long double precision produces something more like 1179999999.99999993782751062099

Converting this to double is now implementation-defined as to whether it selects the next-highest or next-lowest value, but I believe it is typical for the next-lowest to be selected.

Then converting this next-lowest result to integer will truncate the fractional part.

There is an interesting blog post here where the author describes the behaviour of GCC:

It uses long double intermediate precision for x86 code (due to the x87 FPUs long double registers)
It uses actual types for x64 code (because the SSE/SSE2 FPU supports this more naturally)

According to the C++11 standard you should be able to inspect which intermediate precision is being used by outputting FLT_EVAL_METHOD from <cfloat>. 0 would mean actual values, 2 would mean long double is being used.

I'm not sure my answer completely explains things; `double d = st * odds; int64_t res = d;` produces a different result to `int64_t res = (double)(st * odds);` (where the cast is redundant but illustrative). It seems like either (a) the compiler is going straight from intermediate result to `int64`, which I don't think the standard allows, or (b) the compiler is doing a compile-time optimization to compute `res1`, and using different precision than a runtime calculation of the same thing would give (again I'm not sure if the standard permits this) — M.M, Dec 21 '15 at 04:48
round() seems to have done it and I've modified the value of 'odds' a few times to ensure it was consistent. Thanks everyone for your input and help! — jweaver, Dec 21 '15 at 04:49

C++ int64 * double == off by one

2 Answers2