Sum of array of floats returns different results

Question

Here I have a function sum() of type float that takes in a pointer t of type float and an integer size. It returns the sum of all the elements in the array. Then I create two arrays using that function. One that has the BIG value at the first index and one that has it at the last index. When I return the sums of each of those arrays I get different results.
This is my code:

#include <stdlib.h>
#include <stdio.h>

#define N     1024
#define SMALL 1.0
#define BIG   100000000.0

float sum(float* t, int size) {        // here I define the function sum()
  float s = 0.0;
  for (int i = 0; i < size; i++) {
    s += t[i];
  }
  return s;
}

int main() {
  float tab[N];
  for (int i = 0; i < N; i++) {
    tab[i] = SMALL;
  }

  tab[0] = BIG;
  float sum1 = sum(tab, N);            // initialize sum1 with the big value at index 0
  printf("sum1 = %f\n", sum1);

  tab[0] = SMALL;
  tab[N-1] = BIG;
  float sum2 = sum(tab, N);            // initialize sum2 with the big value at last index
  printf("sum2 = %f\n", sum2);

  return 0;
}

After compiling the code and running it I get the following output:

Sum = 100000000.000000
Sum = 100001024.000000

Why do I get different results even though the arrays have the same elements ( but at different indexes ).

Do not use float, as this data type has very limited precision. Use double. — DYZ, Nov 30 '18 at 22:33
IMO [How best to sum up lots of floating point numbers?](https://stackoverflow.com/questions/394174/how-best-to-sum-up-lots-of-floating-point-numbers) is a bad choice for a duplicate. [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) is probably closer as it at least attempts to explain **why**. — Andrew Henle, Nov 30 '18 at 22:52

score 2 · Answer 1 · answered Nov 30 '18 at 22:49

Why do I get different results even though the arrays have the same elements

In floating-point math, 100000000.0 + 1.0 equals 100000000.0 and not 100000001.0, but 100000000.0 + 1024.0 does equal 100001024.0. Given the value 100000000.0, the value 1.0 is too small to show up in the available bits used to represent 100000000.0.

So when you put 100000000.0 first, all the later + 1.0 operations have no effect.

When you put 100000000.0 last, though, all the previous 1000+ 1.0 + 1.0 + ... do add up to 1024.0, and 1024.0 is "big enough" to make a difference given the available precision of floating point math.

score 2 · Accepted Answer · answered Nov 30 '18 at 22:55

What you're experiencing is floating point imprecision. Here's a simple demonstration.

int main() {
    float big = 100000000.0;
    float small = 1.0;

    printf("%f\n", big + small);

    printf("%f\n", big + (19 *small));

    return 0;
}

You'd expect 100000001.0 and 100000019.0.

$ ./test
100000000.000000
100000016.000000

Why'd that happen? Because computers don't store numbers like we do, floating point numbers doubly so. A float has a size of just 32 bits, but can store numbers up to about 3^38 rather than the just 2^31 a 32 bit integer can. And it can store decimal places. How? They cheat. What it really stores is the sign, an exponent, and a mantissa.

sign * 2^exponent * mantissa

The mantissa is what determines accuracy and there's only 24 bits in a float. So large numbers lose precision.

You can read about exactly how and play around with the representation.

To solve this either use a double which has greater precision, or use an accurate, but slower, arbitrary precision library such as GMP.

Sum of array of floats returns different results

2 Answers2