Floating point arithmetic

Question

I reading about floating point and rounding off errors that occur during the floating point arithmetic.

I read lot of articles on IEEE 754-Single precision / Double precision format. I understand that there is sign bit, 8(or) 11 bits of exponent and 23 (or) 52 bits of significand along with implicit leading bit.

I also know that the real numbers whose denominator is not a prime factor of 2 cannot be exactly representable For E.g 0.1 in binary is 0.0001100110011.....

i understood that 0.1+0.1+0.1 is not equal to 0.3 because the accumulation of rounding error.

Also 0.5 is exactly representable in binary format because it is 1/2. But i don't understand given the above accumulation of rounding error , why 0.1+0.1+0.1+0.1+0.1 = 0.5 ?

Are you saying you *don't* get a rounding error when accumulating `0.1` five times? — MooseBoys, Apr 28 '16 at 22:00
i am confused whether the accumulation of error is forsaken when 0.1 is added 5 times or any other arithmetic that leads to a exactly representable number in binary ,if so why? — chebus, Apr 28 '16 at 22:05
Only if the new value results in the imprecision being lost in the inaccuracy. — Ignacio Vazquez-Abrams, Apr 28 '16 at 23:25
Presumably you determined this by writing some code -- if you show the code we can help you better. — Rick Regan, Apr 29 '16 at 01:12
Rick, it's in java boolean b= 0.1+0.1+0.1+0.1+0.1== 0.5//true b=0.1+0.1+0.1==0.3//false my quest why is the reason given for false is not applicable for true case as well, which is round off errors accumulated while arithmetic on Not exactly represent numbers — chebus, Apr 29 '16 at 05:44
In the end, this is probably a duplicate of Why does adding 0.1 multiple times remain lossless? http://stackoverflow.com/questions/26120311/why-does-adding-0-1-multiple-times-remain-lossless?lq=1 would you have set a java or javascript tag that it could have brought more reputations ;) — aka.nice, Apr 30 '16 at 19:44

score 2 · Answer 1 · edited May 23 '17 at 12:16

2

In IEEE754 round to nearest even modes you have some nice properties.
First, for any finite float x and n<54, (2^n-1)x+x == 2^nx See Is 3*x+x always exact?

Then you also have (2^n+1)x == 2^nx + x
(well as long as 2^n+1 is exactly representable, n<53).

With these properties, you have

0.1+0.1==2*0.1
0.1+0.1+0.1 == 3*0.1
0.1+0.1+0.1+0.1 == 4*0.1
0.1+0.1+0.1+0.1+0.1 == 5*0.1

This is not enough, because at this stage, 0.1 is not exactly 1/10, so nothing proves that 5*0.1 == 0.5.
For example 3*0.1 != 0.3, and 5*0.3 != 0.15.

So here, it's just luck, the round off error did annihilate instead of cumulate.
(n*0.1 == n/10.0) is true for 65 out of 100 for the integers n from 1 to 100 (allways true for the 7 powers of two in this interval).

edited May 23 '17 at 12:16

Community

1
1

answered Apr 29 '16 at 12:09

aka.nice

9,100
1
28
40

So you mean to say that error is accumulated for 5*0.3 but was forsaken by rounding that value because the 53 bits were not sufficient for the fraction, and it so happens that rounded value is exactly 0.5, am i correct ? I could not get what you mean by 7 powers of 2 in this interval ,? – chebus May 01 '16 at 13:39
on the other hand i could not able to figure out on basis of this explanation why 0.3+0.3 is now equals 0.6 and 6*0.1 is not equals to 0.6 – chebus May 01 '16 at 13:49

Rick Regan · Answer 2 · 2016-04-29T13:03:03.437

0.1 in double precision is 0.0001100110011001100110011001100110011001100110011001101 in binary. Let's step through the binary additions to see what's happening:

  0.0001100110011001100110011001100110011001100110011001101
+
  0.0001100110011001100110011001100110011001100110011001101
-----------------------------------------------------------
  0.001100110011001100110011001100110011001100110011001101   (52 sig bits -- OK)
+
  0.0001100110011001100110011001100110011001100110011001101
-----------------------------------------------------------
  0.0100110011001100110011001100110011001100110011001100111  (54 sig bits -- must round to 53)
  0.0100110011001100110011001100110011001100110011001101     (rounded up)
+
  0.0001100110011001100110011001100110011001100110011001101
-----------------------------------------------------------
  0.0110011001100110011001100110011001100110011001100110101  (54 sig bits -- must round to 53)
  0.01100110011001100110011001100110011001100110011001101    (rounded down)
+
  0.0001100110011001100110011001100110011001100110011001101
-----------------------------------------------------------
  0.1000000000000000000000000000000000000000000000000000001 (55 sig bits -- must round to 53)
  0.1                                                       (rounded down)

So just due to how the roundings accumulated, 0.1 added five times became 0.5.

(I got these values from my binary converter, binary calculator, and floating-point converter.)

Floating point arithmetic

2 Answers2