4

I have two integer vectors of nearly 1000 size, and what I am going to do is to check whether the sum of the square integer for these two vectors is the same or not. So I write the following codes:

std::vector<int> array1;
std::vector<int> array2;
... // initialize array1 and array2, and in the experiment all elements
    // in the two vectors are the same but the sequence of elements may be different.
    // For example: array1={1001, 2002, 3003, ....} 
   //               array2={2002, 3003, 1001, ....}
assert(array1.size() == array2.size());
float sum_array1 = 0;
float sum_array2 = 0;
for(int i=0; i<array1.size(); i++)
       sum_array1 +=array1[i]*array1[i];
for(int i=0; i<array2.size(); i++)
       sum_array2 +=array2[i]*array2[i];

I expect that sum_array1 should be equal to sum_array2, but in fact in my application I found they were different sum_array1 = 1.2868639e+009 while sum_array2 = 1.2868655e+009. What I have done next is to change the type of sum_array1 and sum_array2 to double type as the following codes show:

 double sum_array1 = 0;
    double sum_array2 = 0;
    for(int i=0; i<array1.size(); i++)
           sum_array1 +=array1[i]*array1[i];
    for(int i=0; i<array2.size(); i++)
           sum_array2 +=array2[i]*array2[i];

This time sum_array1 is equal to sum_array2 sum_array1=sum_array2=1286862225.0000000. My question is why it could happen. Thanks.

feelfree
  • 11,175
  • 20
  • 96
  • 167
  • 10
    [What Every Computer Scientist Should Know About Floating-Point Arithmetic](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html) – ppeterka Sep 06 '13 at 15:47
  • You haven't provided enough information. The two calculations are identical, so they should produce the same result. The problem must e somewhere else, in the code that you haven't shown. – Pete Becker Sep 06 '13 at 15:54
  • 1
    Never compare floats for equality! Either use arbitrary or fixed precision integer class, or compare floats within an acceptable level of error. – Neil Kirk Sep 06 '13 at 15:55
  • Are these vectors the same length? Can you sort them and compare element-wise? Can you exploit the identity sq(a) - sq(b) == (a + b) * (b - a)? Just some things to consider. – Adam Burry Sep 06 '13 at 16:04
  • @AdamBurry The two vectors are of the same size. I have sorted them and verify that each element in two vectors is identical. – feelfree Sep 06 '13 at 16:08
  • @PeteBecker The two vectors are too large to be posted here. In fact this is what confusing me most: the two calculations are identical, they should produce the same result. However, what I have found is contradictory to my thought. – feelfree Sep 06 '13 at 16:10
  • @feelfree, the behaviour you are seeing is completely expected as you must have gathered by now. The calculations are not identical because you are adding the numbers in different orders. As much as possible with floats you want to deal with numbers of similar magnitude. When you add a small number to a large one, the small number becomes insignificant noise past a certain point. – Adam Burry Sep 06 '13 at 16:14
  • 2
    I don't think this question should be closed. The question wasn't just about comparing floating point numbers. It was about why adding floating point numbers in different orders is coming up with different results. If there's a duplicate of that problem on SO, I'll vote to mark as duplicate, but this isn't a duplicate of the "comparing floats" common problem. – Timothy Shields Sep 06 '13 at 16:15
  • @TimothyShields - agreed. The question is "why are these results different", not "how can I pretend that the results are the same". – Pete Becker Sep 06 '13 at 19:19
  • @feelfree - as others have said, the problem comes up because the values in the vectors are in different orders, as indicated in the comments. So forget my request for more information... – Pete Becker Sep 06 '13 at 19:27

4 Answers4

5

Floating point values have a finite size, and can therefore only represent real values with a finite precision. This leads to rounding errors when you need more precision than they can store.

In particular, when adding a small number (such as those you're summing) to a much larger number (such as your accumulator), the loss of precision can be quite large compared with the small number, giving a significant error; and the errors will be different depending on the order.

Typically, float has 24 bits of precision, corresponding to about 7 decimal places. Your accumulator requires 10 decimal places (around 30 bits), so you will experience this loss of precision. Typically, double has 53 bits (about 16 decimal places), so your result can be represented exactly.

A 64-bit integer may be the best option here, since all the inputs are integers. Using an integer avoids loss of precision, but introduces a danger of overflow if the inputs are too many or too large.

To minimise the error if you can't use a wide enough accumulator, you could sort the input so that the smallest values are accumulated first; or you could use more complicated methods such as Kahan summation.

Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
4

In the two loops you're adding the same numbers but in different orders. As soon as sums exceed the integer value that can be exactly represented by a float, you're going to start losing precision, and the sums may end up slightly different.

An experiment for you to try:

float n = 0;
while (n != n + 1)
    n = n + 1;
//Will this terminate? If so, what is n now?

If you run this, you'll find that the loop actually terminates - which seems completely counter-intuitive, but is correct behavior according to the definitions for IEEE single-precision floating point arithmetic.

You can try the same experiment, replacing float with double. You'll witness the same strange behavior, but this time the loop will terminate when n is much larger, because IEEE double-precision floating point numbers enable a much finer accuracy.

Timothy Shields
  • 75,459
  • 18
  • 120
  • 173
  • 1
    The loop may never terminate if you use (inadvertedly) aggressive optimization, which renders `n!=n+1` as `false` no matter what `n`, i.e. uses mathematically exact arithmetic for this reasoning (excluding `n` = infinity). – Walter Sep 06 '13 at 16:12
  • @Walter Important point to accompany this answer. – Timothy Shields Sep 06 '13 at 16:13
3

Floating-point representation (Commonly IEEE754) uses finite bits to represent decimals, so operations with floating-point numbers result in precision loss.

Normally, opposite to the common sense, comparisons like a == ((a+1)-1) results in false if a is a floating-point variable.

Solution:

To compare two floating points, you have to use a kind of "precision loss ranges". That is, if a number differs from other less than that precision-loss-range, you consider that numbers are equal:

//Supposing we can overload operator== for floats
bool operator==( float lhs , float rhs)
{
    float epsilon = std::numeric_limits<float>.epsilon();

    return std::abs(lhs-rhs) < epsilon;
}
Manu343726
  • 13,969
  • 4
  • 40
  • 75
  • But the two calculations are **identical**. Floating-point is not random, and the results **should** be the same. – Pete Becker Sep 06 '13 at 15:52
  • The edit is not correct. There are values for which that equality does not hold, but it is **not** true as a general statement. – Pete Becker Sep 06 '13 at 15:53
  • 1
    @PeteBecker, they aren't the same if they are added in different orders. – josh poley Sep 06 '13 at 15:53
  • 1
    @PeteBecker Depends which floats were put into CPU registers and when. CPU register may be larger than float type. – Neil Kirk Sep 06 '13 at 15:53
  • @joshpoley - you're right; I didn't notice that. – Pete Becker Sep 06 '13 at 15:55
  • @PeteBecker *should be the same* but **is not guaranteed**. Depends on the order of the operations. The point here is that kind of operations have undefined behaviour. – Manu343726 Sep 06 '13 at 15:56
  • @Manu343726 I don't understand that. What's the difference between "should be" and "guaranteed"? Where does the should come from? – Neil Kirk Sep 06 '13 at 15:57
  • @Manu343726 - no, there's no undefined behavior here. Josh got it right: the order of the values is different, and that's why the results are different. – Pete Becker Sep 06 '13 at 15:57
  • @NeilKirk - he's somewhat imprecisely describing the actual problem: that the values are in two different orders, and that affects the result. The results "should be" the same in the mathematical world, but they're not "guaranteed" to be the same in the computer world. – Pete Becker Sep 06 '13 at 15:59
  • @PeteBacker thanks, thats exatly wat I mean: should (Mathematics), not guaranteed (computers implementation) – Manu343726 Sep 06 '13 at 16:04
  • @NeilKirk That makes sense. Compilers often guarantee minimum precision on a calculation but not maximum precision. All registers are not equal. Thus the addition of a simple piece of logging can sometimes change the result of a calculation. – morechilli Sep 06 '13 at 16:11
2

A double has more bits and thus holds more information than a float. When you are adding values to the float, it will end up rounding the information at different times for sum_array1 vs sum_array2.

Depending on the input values, you can end up with the same issue when using a double as a float (if the values are large enough).

A web search for "everything you need to know about floating point numbers" will give you a good overview of the limitations, and how to best deal with them.

josh poley
  • 7,236
  • 1
  • 25
  • 25
  • @Pete: The calculations are **not identical**. They're in a different order, and this can affect the results. They're **identical** if you loop through `array1` twice. – Ken White Sep 06 '13 at 15:59