3

I noticed that with Visual Studio there are precision errors when adding double to long long. For example:

long long a = 44981600439878676;
double b = 234567890;
a += b;

The result of a is 44981600674446560 but should be 44981600674446566. It happens for both x32 and x64.

However the following returns the correct value:

long long a = 44981600439878676;
double b = 234567890;
a += (long long)b;

I noticed in the disassembly that in the first case without the explicit cast, there is

0116A892  call        __ltod3 (011619DDh)  
0116A897  addsd       xmm0,mmword ptr [b]  
0116A89C  call        __dtol3 (01161A05h) 

While in the second case __ltod3 is not called. I am explaining this with VC++ compiler by default converting first long long to double and then again double to long long because double is more simple type than long long. In this way we lose precision because of the __ltod3 and int64 containing too big value. But from another hand, a is l-value and in this case, because compiler knows that the output would be long long, it looks unnecessary to convert the left side first to double and then again to long long during the addition. Also it is very easy someone to make errors and to omit the explicit cast because precision errors would become visible only for certain numbers.

Is this double conversion part of the C++ standard or is implementation of VS?

Baj Mile
  • 750
  • 1
  • 8
  • 17
  • double has less precision than long long. – drescherjm Dec 09 '18 at 18:09
  • See [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) – 1201ProgramAlarm Dec 09 '18 at 18:10
  • 4
    You should be able to figure this out by yourself, after you try to wrap your brain around two facts: 1. `sizeof(double)` is the same as `sizeof(long long)`. 2) a `double` appears to be able to represent much larger values than a long long, such as `1e100`, or 1 followed by a hundred zeroes. Now, ask yourself how that's possible, if both a `long long` and `double` take the same number of bytes, and you should be able to figure out the answer to your question. – Sam Varshavchik Dec 09 '18 at 18:12
  • I just wondered if after the addition the final result would be stored in long long anyway, why the left side is converted by default to double and then again to long long? What if I want by default not to lose precision. In a complex code there may be too many places to check for missing explicit casts. – Baj Mile Dec 09 '18 at 18:24
  • This is how c++ arithmetic works, you _have to_ always keep it in mind. This is also why implicit conversions are evil. – Passer By Dec 09 '18 at 18:52
  • The compiler changing the types would change the program's observable behaviour. There are very few places where the compiler has that amount of latitude (copy elision comes to mind immediately). There is no way for the compiler to know that you really DID want that behaviour (at least until telepathic support is added in some future C++ Standard revision). You may need a fixed point library if you want fractions and precise. If you find yourself in a position where you need tonnes of casting, ensure that you haven't composed your problem poorly. – user4581301 Dec 09 '18 at 18:58

1 Answers1

2

According to the standard [expr.ass/7]:

The behavior of an expression of the form E1 op= E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once.

Therefore the usual arithmetic conversions apply even if the final result may once again need conversion back to the type of a (see [expr.ass/3]).

For a += b in your example by [expr.arith.conv/1.3] a is converted to double. The addition is performed using floating point arithmetic. With your particular values a's exact integer value and the addition result's exact integer value are not representable exactly with double and therefore the result is inexact.

With a += (long long)b both operands are long long and therefore no conversion is necessary. The addition is performed using integer arithmetic.

In your particular example the value of b happens to be in the range of exact representation with double. Therefore the conversion from the integer literal to double and back to long long with (long long)b happens to give back the same value. Therefore the addition result is the exact one.