Errors count increasing during the calculation of double value

Question

I am working on a calculation which is to convert double to binary, a strange problem happens during this process and finally leads to an error. So I print out the fractional part when I found result is wrong.

A piece of code for fractional part is like this:

        while(float_part != (int)(float_part)){
            float_part -= (int)(float_part); //just leave fractional part
            float_part *= 2; //float_part is a double
            res = res + to_string(((int)(float_part))); //add to "res", which is a string
            cout << float_part << "+" << length << "\n"; //to figure out why
            length--;  //the length is initialized to 32
            if(length <= 0){
                return "ERROR"; //if too long
            }
    }

Then I input "28187281.525"(only .525 matters in the above piece of code) and found the result is so weird:

    1.05+32
    0.1+31
    0.2+30
    0.4+29
    0.8+28
    1.6+27
    1.2+26
    0.4+25
    0.799999+24
    1.6+23
    1.2+22
    0.399994+21
    0.799988+20
    1.59998+19
    1.19995+18
    0.399902+17
    0.799805+16
    1.59961+15
    1.19922+14
    0.398438+13
    0.796875+12
    1.59375+11
    1.1875+10
    0.375+9
    0.75+8
    1.5+7
    1+6
    1101011100001101010010001.100001100110011001100110011

In the beginning it's okay, but eventually the result becomes wrong!

And why 0.4*2 become 0.799999..

Anyone know the reason? Thanks in advance!

score 2 · Accepted Answer · answered Oct 18 '16 at 10:35

2

Floating point values have a limited precision. Any operations you do on them can introduce small errors. The more operations you perform, the more the error increases. In your case, you should split your floating point variable into its integer components (sign, mantissa and exponent), and perform any operations on those integers. Floating points are normally stored in IEEE_754 format:

https://en.wikipedia.org/wiki/Floating_point#IEEE_754:_floating_point_in_modern_computers

answered Oct 18 '16 at 10:35

G. Sliepen

7,637
1
15
31

So basically, convert the fractional part to a int or long is the only safe way? Is there any other tricks? – 彭浩翔 Oct 18 '16 at 10:43
The trick is to copy the floating point value into a suitably sized `int` (you can use a `union` for this or straight `memcpy()`), and then use bitwise operations and shifts to get the sign bit, exponent and mantissa. Once you have done this, things should be very easy, because basically you are just printing the mantissa in binary, and the only problem left is to put the decimal point in the right place. – G. Sliepen Oct 18 '16 at 11:22
thanks, I followed this way seems good! – 彭浩翔 Oct 18 '16 at 12:35

score 1 · Answer 2 · answered Oct 18 '16 at 10:38

1

This is the nature of finite precision arithmetic when you manipulate values that can't be represented exactly.

0.4*2 becomes 0.7999999 for the same reason 1/3 times 3 becomes 0.9999999 -- the best you can do in decimal is represent 1/3 as 0.333333 and if you multiply that by 3, you get 0.99999. You would need an infinite number of digits to get the exact answer.

answered Oct 18 '16 at 10:38

David Schwartz

179,497
17
214
278

Hey thanks man, so practically how to calculate a float or double? Convert fractional part to a int or long? It seems so troublesome.. – 彭浩翔 Oct 18 '16 at 10:45
It really depends on exactly what you're trying to do. But perhaps you're using the wrong tool for the job and should be using something else. For fractions, maybe ratios of integers. For floating point, maybe higher precision, maybe rounding, maybe floating decimal point. It depends on exactly what your requirements are. But you can't skip the step of identifying your requirements and choosing tools that can meet them. – David Schwartz Oct 18 '16 at 10:47

Errors count increasing during the calculation of double value

2 Answers2