Confusion about single precision float adding integer caused difference in C/C++

Question

I was implementing calculate mean and stddev for unsigned char type array (say, gray image). To store the sum and average, I use float and double type and get different result. Now I simplify this issue, with minimal reproducing code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(){
    #define BASE 16777306

#if 0
    float a = BASE;
    unsigned char b = 179;
    float c = a + b; // float sum
    printf("float  type, a=%f, (float)b=%f, c=%f, a+b=%d\n", a, (float)b, c, BASE+b);
#else
    double a = BASE;
    unsigned char b = 179;
    double c = a + b; // double sum
    printf("double type, a=%lf, (double)b=%lf, c=%lf, a+b=%d\n", a, (double)b, c, BASE+b);
#endif

    return 0;
}

Result:

double type, a=16777306.000000, (double)b=179.000000, c=16777485.000000, a+b=16777485
float  type, a=16777306.000000, (float)b=179.000000, c=16777484.000000, a+b=16777485

As shown above, 16777484.000000 generated from float type, is 1 less than 16777485.000000(the correct one).

Why using float type for sum leading to wrong result? While double type remains correct as int type?

`float` has only about 23 bits of precision (accurately representing integer values up to roughly 8 million). To make matters worse, when you perform addition on values with vastly different magnitude, you can lose precision too. What is the purpose of using `float` here when you're only wanting integer arithmetic? — paddy, Mar 19 '21 at 06:26
The rule of thumb is never use floating point in integer calculations where you expect exact results. — dxiv, Mar 19 '21 at 06:29
@paddy I was developing image processing functions for arm platform, and was told "don't use double since double may be slow, blablabla". — ChrisZZ, Mar 19 '21 at 06:31
"blablabla" is based in good reasoning and you should listen to it but more importantly try to understand it and not just pass it off as people making noise. But if you _need_ double precision then you should _use_ it. I don't see any need here, because you are just doing integer arithmetic. — paddy, Mar 19 '21 at 06:33
@paddy For this simple snippet, integer is enough. For a 4000x4000 size gray image (unsigned char), accumulating each pixel value to int32_t may be overflow, and for >= 4104x4104 size gray image, uint32_t may also overflow. I use float type to avoid "overflow" for worst case(all pixel equal 255). — ChrisZZ, Mar 19 '21 at 06:38
If you want exact integer values, a `float` is limited to 16777216. Above that value, multiple `int` value map to the exact same `float` value. See [this answer for an example](https://stackoverflow.com/questions/23420783/convert-int-max-to-float-and-then-back-to-integer/23423240#23423240). — user3386109, Mar 19 '21 at 06:38
If I remember correctly a good rule of thumb is that float has about 6 significant decimal digits precision. If you need more get integer types or double depending on use case. But floating point has some inherent imprecision that intergers do not have. — Kami Kaze, Mar 19 '21 at 08:23
`float` as a 32-bit object, can store exactly about 2^32 different values. 16777485.0 is not one of them. A nearby alternative is 16777484.0. `double` as a 64-bit object, can store exactly about 2^64 different values. 16777485.0 is one of them. The values `float/double` can store are distributed logarithmically not linearly. The "decimal" point "floats", not fixed, like integers. — chux - Reinstate Monica, Mar 19 '21 at 10:11

Confusion about single precision float adding integer caused difference in C/C++

0 Answers0