2

I observe a strange behaviour with mingw while performing multiplication of a unit64 variable with float. Is this a problem in the compiler?

Below is my code:

#include <iostream>
using namespace std;

int main() {

    uint64_t myvar = 4293057;
    float mycrasher = 1000;
    myvar = myvar*mycrasher;
    cout << "value after mul is "<<myvar << endl; 
    return 0;
}

The output value displayed is 4293057024 and not 4293057000!!

Shreyas S
  • 328
  • 3
  • 11

1 Answers1

1

An IEEE754 single-precision value (such as your float) has only about seven decimal digits of precision. Beyond that, it's imprecise. You don't even have to multiply it by 1000, multiplying it by 10 gives you 42930568.

You can see what's happening with the following code:

#include <iostream>
int main () {
    float xyzzy = 42930570;
    std::cout.precision (5);
    std::cout << "xyzzy is " << std::fixed <<xyzzy << '\n';
    return 0;
}

which outputs the not-so-precise:

42930568.00000

In order to explain more fully, IEEE754 floating point values are limited in precision simply because they have a limited number of bits available for that purpose. Single precision values are 32 bits in length and have 23 bits of that for the fraction (the other bits are for sign and exponent). Those 23 bits equate to about seven decimal digits. You can find further analysis here.

The number 42930570 is not able to be represented exactly in a single precision value. You either get the bit pattern 0x4c23c462 which is 42930568 or the next higher 0x4c23c463 which is 42930572.


The reason why they're being converted to float instead of uint64_t is because that's what the standard says. In C++03, the "multiplicative operators" section (5.6) says:

The usual arithmetic conversions are performed on the operands and determine the type of the result.

The usual arithmetic conversions are detailed in section 5, paragraph 9 and consist of:

  • If either operand is of type long double, the other shall be converted to long double.
  • Otherwise, if either operand is double, the other shall be converted to double.
  • Otherwise, if either operand is float, the other shall be converted to float.
  • More stuff dealing with integral types (it doesn't actually say that, I'm paraphrasing).

Since you have a float and a uint64_t, that's covered by the third bullet point above. In the "floating-integral conversion" section (4.9), we see:

An rvalue of an integer type or of an enumeration type can be converted to an rvalue of a floating point type. The result is exact if possible. Otherwise, it is an implementation-defined choice of either the next lower or higher representable value.

Hence that's why you're seeing the loss of precision.

That hasn't changed in C++11. The wording is changed, and a little more verbose, but the sections still boil down to the same results.

Community
  • 1
  • 1
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • But why is float used to store the product. In arthimetic operations wont the data type of the largest data type be taken? So in this case it should be uint64_t. Also even typecasting the product to unit64_t does not solve the problem – Shreyas S Dec 04 '12 at 12:28