0

Here is my code:

#include <stdio.h>

int main()
{
    
    float dt = 0.0002, t0;
    int it0;
    
    for ( t0 = 0.0512; t0 <= 0.0544; t0 += 0.0004 ) {
        
            it0 = (int)(t0/dt);
            printf("t0 = %f, it0 = %d\n",t0,it0);
    }

    return 0;
}

I run the above code, the result is:

t0 = 0.051200, it0 = 256                                                                                                                                                           
t0 = 0.051600, it0 = 258                                                                                                                                                           
t0 = 0.052000, it0 = 260                                                                                                                                                           
t0 = 0.052400, it0 = 262                                                                                                                                                           
t0 = 0.052800, it0 = 264                                                                                                                                                           
t0 = 0.053200, it0 = 265 --> Start to get wrong from here!                                                                                                                                                           
t0 = 0.053600, it0 = 267                                                                                                                                                           
t0 = 0.054000, it0 = 269                                                                                                                                                           
t0 = 0.054400, it0 = 271

When t0 = 0.053200, it0 should be 266 (0.0532/0.0002=266). But here it shows it0 = 265. To investigate the reason, I run t0 = 0.053200 independently as follows:

#include <stdio.h>

int main()
{
    
    float dt = 0.0002, t0 = 0.0532;
    int it0;
    
    it0 = (int)(t0/dt);
    printf("t0 = %f, it0 = %d\n",t0,it0);

    return 0;
}

The result is correct:

t0 = 0.053200, it0 = 266

I don't know why it happens. So I do another test which prints the float (before converting) and integer (after converting) using code below:

#include <stdio.h>

int main()
{

    float dt = 0.0002, t0, test;
    int it0;

    for ( t0 = 0.0512; t0 <= 0.0544; t0 += 0.0004 ) {

            it0 = (int)(t0/dt);
            test = t0/dt;
            printf("t0 = %f, it0 = %d / %f\n",t0,it0,test);
    }

    return 0;
}

The result is:

t0 = 0.051200, it0 = 256 / 256.000000                                                                                                                                              
t0 = 0.051600, it0 = 258 / 258.000000                                                                                                                                              
t0 = 0.052000, it0 = 260 / 260.000000                                                                                                                                              
t0 = 0.052400, it0 = 262 / 262.000000                                                                                                                                              
t0 = 0.052800, it0 = 264 / 264.000000                                                                                                                                              
t0 = 0.053200, it0 = 265 / 265.999969                                                                                                                                              
t0 = 0.053600, it0 = 267 / 267.999969                                                                                                                                              
t0 = 0.054000, it0 = 269 / 269.999969                                                                                                                                              
t0 = 0.054400, it0 = 271 / 271.999969

I notice that t0 = 0.053200, it0 = 265 / 265.999969. I guess the convert to integer makes 265.999969 a wrong number which is 265. Do you know how to solve it?

David
  • 31
  • 5

1 Answers1

1

The following assumes our C implementation uses the IEEE-754 binary32 format for float and the binary64 format for double, which is very likely

In for ( t0 = 0.0512; t0 <= 0.0544; t0 += 0.0004 ), t0 = 0.0512 puts the value 0.0511999987065792083740234375 in t0.

The t0 += 0.0004 converts the float t0 to a double, adds 0.0004000000000000000191686944095437183932517655193805694580078125, rounds the result to a double, and then rounds the result of that to float and stores it in t0. This contains as long as t0 <= 0.0544 is true, which is as long as t0 is less than or equal to 0.054399999999999996969091142773322644643485546112060546875.

Printed with full precision (%.99g), the output of your program is:

t0 = 0.0511999987065792083740234375, it0 = 256
t0 = 0.0515999980270862579345703125, it0 = 258
t0 = 0.0519999973475933074951171875, it0 = 260
t0 = 0.0523999966681003570556640625, it0 = 262
t0 = 0.0527999959886074066162109375, it0 = 264
t0 = 0.0531999953091144561767578125, it0 = 265
t0 = 0.0535999946296215057373046875, it0 = 267
t0 = 0.0539999939501285552978515625, it0 = 269
t0 = 0.0543999932706356048583984375, it0 = 271

In summary, your program is using binary-based approximations of the decimal numbers, and, whenever an arithmetic result is not representable in the floating-point format, it is rounded to the nearest number that is representable in the format.

While the long numerals above look complicated, we can see they are actually simple, all fractions of the form n/228, if we print them multiplied by 228:

t0*2^28 = 13743895, it0 = 256
t0*2^28 = 13851269, it0 = 258
t0*2^28 = 13958643, it0 = 260
t0*2^28 = 14066017, it0 = 262
t0*2^28 = 14173391, it0 = 264
t0*2^28 = 14280765, it0 = 265
t0*2^28 = 14388139, it0 = 267
t0*2^28 = 14495513, it0 = 269
t0*2^28 = 14602887, it0 = 271

So, when working with floating-point, you must keep in mind that you are working with fractions using powers of two, not decimal numerals, and operations with those numbers will introduce various errors.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312