C++ - Converting long to float or double rounds the value

Question

I want to convert long to float/double but casting rounds the value. How can I convert without it being rounded?

Here is my code:

#include <cmath>
#include <iostream>

int main()
{
    const uint32_t UW = 1000;
    int txPower = 3012916;

    long x = std::stoi("3012916") * UW;
    std::cout << "x = " << x << std::endl;

    auto xFloat = (float)x;
    std::cout << "x float = " << xFloat << std::endl;

    auto xDouble = (double)x;
    std::cout << "xDouble = " << xDouble << std::endl;
    std::cout << "xDouble value cast = " << (double)3012916000 << std::endl;

    return 0;
}

Yields this output:

x = 3012916000                                                                                                                                              
x float = 3.01292e+09                                                                                                                                       
xDouble = 3.01292e+09                                                                                                                                       
xDouble value cast = 3.01292e+09

Help!

Note: on a system where `int` is 32 bit you will get some truly strange results. — user4581301, Mar 06 '19 at 03:39
lots of duplicates: [Loss of precision - int -> float or double](https://stackoverflow.com/q/2781086/995714), [What's the difference when casting the float and double to int](https://stackoverflow.com/q/24099337/995714), [Can a IEEE 754 real number "cover" all integers within its range?](https://stackoverflow.com/q/12442685/995714), [Which is the first integer that an IEEE 754 float is incapable of representing exactly?](https://stackoverflow.com/q/3793838/995714), [Are all integer values perfectly represented as doubles?](https://stackoverflow.com/q/43655668/995714) — phuclv, Mar 06 '19 at 06:10
Possible duplicate of [What is going on with this int to float conversion, why is it innacurate?](https://stackoverflow.com/questions/52283831/what-is-going-on-with-this-int-to-float-conversion-why-is-it-innacurate) — phuclv, Mar 06 '19 at 06:10

score 4 · Accepted Answer · answered Mar 06 '19 at 05:09

As already mentioned by Mark Ransom in the comments, the output that you are seeing is just a short form for the actual value that is stored in your float or double value. You can see more digits by using, i.e. std::setprecision(15).

Standard single-precision floating point values have 32 bits with 23 bits reserved for the mantissa. The most significant bit of the mantissa is assumed to be one (but not stored) when the floating point value is not zero. That means you have 24 bits for storage and the maximum value the mantissa can hold is 2^24 or 16777216. As you can see, you can store about 7 digits without losing precision. I say 'about' because not all decimal representations of a floating point value can be expressed with the same precision in binary format.

Here is an interesting experiment:

long n0 = 16777210;
for (int i = 0; i < 10; i++)
{
    long n = n0 + i;
    std::cout << "n=" << n << " / ((float)n)=" << std::setprecision(15) << ((float)n) << std::endl;
}

The output is:

n=16777210 / ((float)n)=16777210
n=16777211 / ((float)n)=16777211
n=16777212 / ((float)n)=16777212
n=16777213 / ((float)n)=16777213
n=16777214 / ((float)n)=16777214
n=16777215 / ((float)n)=16777215
n=16777216 / ((float)n)=16777216
n=16777217 / ((float)n)=16777216
n=16777218 / ((float)n)=16777218
n=16777219 / ((float)n)=16777220

The number 3012916000 is too large to be held exactly in a single precision floating point value. When you output your number like so:

std::cout << "x float = " << std::setprecision(15) << xFloat << std::endl;

Then the output is:

x float = 3012915968

Double values have a 52+1 bit mantissa and your number can therefore be stored exactly:

std::cout << "xDouble = " << std::setprecision(15) << xDouble << std::endl;

Output:

xDouble = 3012916000

score -1 · Answer 2 · answered Mar 06 '19 at 03:40

-1

"How can I convert without it being rounded?" Representing Integer values as floating point values always rounds them due to their technical representation. See here Convert int to double

answered Mar 06 '19 at 03:40

Jay-Pi

343
3
13

1

A 64-bit `double` can hold all possible 32-bit integers without rounding or loss of precision. – Mark Ransom Mar 06 '19 at 04:18
1

You are correct, my fault. http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)FloatingPoint.html Keep forgetting difference in long long int and long for other languages. – Jay-Pi Mar 06 '19 at 04:30

C++ - Converting long to float or double rounds the value

2 Answers2