4

There are two similar matlab programs, one iterates 10 times while another iterates 11 times.

One:

i = 0;
x = 0.0;
h = 0.1;
while x < 1.0
    i = i + 1;
    x = i * h;
    disp([i,x]);
end

Another:

i = 0;
x = 0.0;
h = 0.1;
while x < 1.0
    i = i + 1;
    x = x + h;
    disp([i,x]);
end

I don't understand why there is difference between the floating point add operation and the multiple.

Yantao Xie
  • 12,300
  • 15
  • 49
  • 79
  • 1
    This might be due to fp representation within your variable x. Thus at the end x might be slightly smaller than one in one case since the error propagation is larger for the sum. What is the actual output? Which is the last x-value you see? – Howard Jun 25 '11 at 10:27
  • @Howard The x-value of the former is 1, 1.1 of the latter. – Yantao Xie Jun 25 '11 at 10:40
  • @Cook try outputting `x-1` instead of `x` itself. Then you should see a difference (see te example in my answer below). – Howard Jun 25 '11 at 10:42
  • 1
    I was going to link to (Goldberg 1991) and then found this: http://floating-point-gui.de/ – msw Jun 25 '11 at 10:49
  • @Howard In fact, I know why x-1=-1.1102230246251565E-16 in the latter. What I don't understand is the former. Since 0.1 cannot be represented exactly in a binary system, why 0.1 * 10 can equal to 1 exactly. – Yantao Xie Jun 25 '11 at 10:53
  • @msw Thanks. It seems to be a wonderful site. I will have a look. – Yantao Xie Jun 25 '11 at 10:57
  • 1
    possibly related questions: http://stackoverflow.com/q/686439, http://stackoverflow.com/q/3697234, http://stackoverflow.com/q/2202641 – Amro Jun 25 '11 at 14:40

3 Answers3

4

You should be very carefully when you do iterations with float counters. Just as an example I'll show you what happens in your case (it is a Java program but your case should be the same): click here to run it yourself

double h = 0.1;
System.out.println(10*h-1.0);
System.out.println(h+h+h+h+h+h+h+h+h+h-1.0);

It just prints the difference to one when doing a multiplication vs. seprarate additions.

Since the representation of floats is not exact the result looks like this:

0.0
-1.1102230246251565E-16

Thus if you use this as a looping condition in the latter case there will be an additional iteration (one is not yet reached).

Try to use the counter variable i which is an integer and you won't run into such issues.

Howard
  • 38,639
  • 9
  • 64
  • 83
  • Thanks. The matlab programs are copied from a exercises in the book of Elementary Numerical Analysis(K. Atkinson). I want to know why there is difference. – Yantao Xie Jun 25 '11 at 10:43
2

Compare the output of the following:

>> fprintf('%0.20f\n', 0.1.*(1:10))
0.10000000000000001000
0.20000000000000001000
0.30000000000000004000
0.40000000000000002000
0.50000000000000000000
0.60000000000000009000
0.70000000000000007000
0.80000000000000004000
0.90000000000000002000
1.00000000000000000000

>> fprintf('%0.20f\n', cumsum(repmat(0.1,1,10)))
0.10000000000000001000
0.20000000000000001000
0.30000000000000004000
0.40000000000000002000
0.50000000000000000000
0.59999999999999998000
0.69999999999999996000
0.79999999999999993000
0.89999999999999991000
0.99999999999999989000

Also compare against using MATLAB's COLON operator:

>> fprintf('%0.20f\n', 0.1:0.1:1)
0.10000000000000001000
0.20000000000000001000
0.30000000000000004000
0.40000000000000002000
0.50000000000000000000
0.59999999999999998000
0.69999999999999996000
0.80000000000000004000
0.90000000000000002000
1.00000000000000000000

If you want to see the 64-bit binary representation, use:

>> format hex
>> [(0.1:0.1:1)' (0.1.*(1:10))' cumsum(repmat(0.1,10,1))]
   3fb999999999999a   3fb999999999999a   3fb999999999999a
   3fc999999999999a   3fc999999999999a   3fc999999999999a
   3fd3333333333334   3fd3333333333334   3fd3333333333334
   3fd999999999999a   3fd999999999999a   3fd999999999999a
   3fe0000000000000   3fe0000000000000   3fe0000000000000
   3fe3333333333333   3fe3333333333334   3fe3333333333333
   3fe6666666666666   3fe6666666666667   3fe6666666666666
   3fe999999999999a   3fe999999999999a   3fe9999999999999
   3feccccccccccccd   3feccccccccccccd   3feccccccccccccc
   3ff0000000000000   3ff0000000000000   3fefffffffffffff

Some suggested readings (MATLAB related):

Amro
  • 123,847
  • 25
  • 243
  • 454
  • Thanks. In fact, I had tested what you've tested. I know these phenomenon but I am not able to explain why there is difference between accumulation and multiplication. – Yantao Xie Jun 26 '11 at 01:50
  • 1
    Repeated additions are accumulating round-off errors. Since floating point numbers have limited precision, the result is rounded off in order to fit into the finite number of bits. I strongly recommend you check out the links I provided. – Amro Jun 26 '11 at 02:21
0

The representation of floats IS exact, except that float arithmetic is in base 2, and decimals, such as 0.1, have infinite binary expansion. Since floats have a finite number of bits, the infinite expansion of 0.1 must be rounded off, and the roundoff error accumulates upon addition, leading to the discrepancy.

However, most floating-point operations are inexact: results USUALLY require more bits of precision than can be fit in a fixed number of bits, so the CPU automatically rounds the result to fit into the available precision. Such round-off errors accumulate in long computations chains, as you have noticed, and result in sometimes huge discrepancies between the actual and the "correct" result. ("correct" being defined as the result obtained in infinite-precision arithmetic.)

zvrba
  • 24,186
  • 3
  • 55
  • 65
  • Float representation is NOT exact. It is exact for a certain set of numbers, but not for any possible number. You've said it yourself: 0.1 (or 1/10) in binary expansion has an infinite number of digits which has to be truncated, hence there is no such thing as an exact representation of 1/10 in binary floats! You notice the same in decimal notation: 1/3 (0.3333...) cannot be represented exactly with a finite number of digits in decimal expansion. – Egon Jun 25 '11 at 11:50
  • True, floats cannot represent every conceivable real number, 0.1 among them, but the set of representable numbers is *exactly* representable. For example, 0.125=1/8 is exactly represented and there is no inexactness in the representation. 0.1 is not among exactly representable numbers, but sqrt(2) is not either, when restricted to finite radix-based representation. So what's your point? – zvrba Jun 25 '11 at 13:11
  • 1
    My point is that your statement that "floating point representation is exact" is incorrect. It is exact for a lot of numbers, but not for all numbers (for which the decimal expansion IS finite and limited to about 15 or 16 decimals (in the case of double precision)). So the representation is NOT exact (in general), since there is no mapping from decimal expansion (which is the usual way to write literals in) to binary expansion that results in an error of 0. Saying the representation is exact for representable numbers is like saying all red cars are red, which is true, but has little value. – Egon Jun 25 '11 at 21:13
  • So your reference for exactness are decimal numbers? Well, how do you represent 1/7 exactly in finite number of decimal digits? (Indeed, the *mapping* from decimal floats to binary floats is not exact, but that does not make float inexact.) – zvrba Jun 26 '11 at 05:28
  • 1
    Where am I saying that decimal floats are exact? They are not. The most common representation for literals is a decimal expansion (i.e. you write `x=0.1` or something like that), so you expect the binary representation to match that exactly, but it cannot. The representation `1/7` IS exact, but there is no decimal (nor binary) exact float representation of it. So floats in general are NOT exact. My reference of exactness is the representation the programmer uses to define literals (and there is no float in any base that can represent these exactly). – Egon Jun 26 '11 at 08:44
  • So, in the interest of easy use, and unlike many other features, programming language designers have chosen literal syntax that doesn't reflect the underlying machine architecture at all. That doesn't make floating-point numbers inexact. Also, C99 defines hex float syntax. Is THAT also inexact? – zvrba Jun 27 '11 at 13:58