1

Possible Duplicate:
how to sum a large number of float number?

I have a matrix 'x' that is 10,000 elements by 10,000 elements.

In the first case I declare the matrix like:

int n = 10000;
unsigned int size_M = n*n;
unsigned int mem_size_M = sizeof(int)*size_M;
int* x = (int*)malloc(mem_size_M);

Step (1) The matrix is initialized:

for(i=0;i<n;i++)
    for(j=0;j<n;j++)
        x[i*n+j] = 1;

Step (2) Sum the elements of the matrix and print the total:

for(i=0i<n;i++)
    for(j=0j<n;j++)          
        sum +=x[i*n+j];

printf("sum: %d \n", sum);

As I would expect the above code prints 'sum: 100000000 '.

However if I declare the matrix like:

int n = 10000;
float size_M = n * n;
float mem_size_M = sizeof(float) * size_M;
float* x = (float*)malloc(mem_size_M);

And again perform the steps 1 and 2 the correct answer is not printed out, but '16777216' instead. Why is this?

ANSWER: To get the appropriate answer do a type conversion...

sum +=(int)x[i*n+j];
Community
  • 1
  • 1
t. fochtman
  • 431
  • 3
  • 9
  • Are you trying to print a `float` with `%d` instead of `%f`? – hobbs Nov 24 '12 at 06:01
  • Looks like you're running into the machine epsilon for floating point numbers. Suggested reading: [What Every Computer Scientist should know about floating Point Numbers](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) – helloworld922 Nov 24 '12 at 06:10
  • 1
    http://stackoverflow.com/questions/2148149/how-to-sum-a-large-number-of-float-number – SomeWittyUsername Nov 24 '12 at 07:00
  • If you have a spare hour or two to kill, read [What Every Computer Scientist Should Know About Floating-Point Arithmetic](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html). It will literally make you rethink how floating point numbers "work" if you're not familiar already. – WhozCraig Nov 24 '12 at 08:39

1 Answers1

2

This happens because of the precision limitations of the float type. You can't just add 1.0 to float with value > 16777216 (2^24), but you can add 2.0, or 0.1:

#include <stdio.h>

int main(void)
{
    float f = 16777220;
    printf("f = %f\n", f + 1);
    printf("f = %f\n", f + 2);
    printf("f = %f\n", f + 0.1);
    return 0;
}

The IEEE-754 standard floating-point numbers have have 4 bytes, consisting of a sign bit, an 8-bit excess-127 binary exponent, and a 23-bit mantissa. It's a bit complicated to explain precisely why it happens, but I can say that this is a extreme case when operation error reaches its maximum.

Stanislav Mamontov
  • 1,734
  • 15
  • 21