understand Numerical Stability big computation loss

Question

I'm taking few courses about machine learning and I trying to understand this computational issue:

variable = 1000000000 #1 billion
for i in xrange(1000000):
       variable = variable+ 1e-6 #0.000001

variable = variable -1000000000 #1 subtracts with 1 billion again
variable
#>> 0.95367431640625

should be 1 but turns out to 0.95367431640625

Can some one tell me why this happens ?

Seems like the typical floating point issue - computers can't truly represent those numbers through bits so it gets "close" but you get some amount of error and 15 or 16 significant digits (9 or 10 before the decimal, 6 after) sounds about in line with `float`s from a C perspective (I'm talking generally and roughly; I don't know specifics otherwise I would've posted as an answer, not a comment). — dwanderson, Mar 10 '16 at 14:37
Also, why is this tagged with `performancecounter`? This seems not to have anything to do with `performance` but rather with accuracy. — dwanderson, Mar 10 '16 at 14:37
Some details to prevent this problem can be found [here](http://stackoverflow.com/questions/11522933/python-floating-point-arbitrary-precision-available). — Rens van der Heijden, Mar 10 '16 at 14:41

Ian · Accepted Answer · 2016-03-10T14:52:39.013

You are losing precision. This is because float in Python implementing double floating point precision which only guarantee precision up to the 15/16-th digit.

When you do:

1,000,000,000 + 0.000001
1,000,000,000.000001 + 0.000001
# and so on, note that you are adding the 16-th digit
# but 1,000,000,000.000001 is not actually exactly 1,000,000,000.000001
# behind is something like 1,000,000,000.000001014819 or 1,000,000,000.000000999819

Continuously, you are breaking the precision limit, there are some other values after the last 1 in the 0.000001 which is represented only as 0.000001. Thus you got accumulative error.

Things would have been different if, say, you initialize your variable as 0. This is because in the computation:

0.000000 + 0.000001
0.000001 + 0.000001
0.000002 + 0.000001
#and so on

Although the actual value of 0.000001 isn't exactly 0.000001, but the 16-th digit imprecision is far from the significant numbers:

0.000000 + 0.00000100000000000000011111
0.000001 + 0.00000100000000000000011111 #insignificant error

You could also avoid the error by using decimal value instead of double:

from decimal import *
variable = Decimal(1000000000)
addition = Decimal(1e-6)
for i in xrange(1000000):
   variable = variable+ addition #0.000001

variable = variable-Decimal(1000000000) #1 subtracts with 1 billion again
variable

how can I avoid this ? and make the python count exactly the number I want? — Alvaro Silvino, Mar 10 '16 at 14:43
@AlvaroJoao use `decimal`, it has higher precision or don't add `1,000,000,000` at first — Ian, Mar 10 '16 at 14:46
@Ian I'd imagine he wasn't actually adding to a billion and then just subtracting a billion, but was instead providing us an MCVE (though who knows). — dwanderson, Mar 10 '16 at 14:48
@dwanderson that could certainly be one of the possibilities. :) — Ian, Mar 10 '16 at 14:51

dwanderson · Answer 2 · 2016-03-10T14:45:54.050

Python math can't natively handle arbitrary precision. If you want more precise results, it looks like you need to work with decimal module, and even then, be careful:

from decimal import *
x = Decimal(1000000000)
y = Decimal(1e-6)
z = x+y
z
##>> Decimal(1000000000.00000100000000000)
w = z-x
w
##>> Decimal(0.000001000000000000)

## however, when I tried:
bad_x = Decimal(1000000000 + 1e-6)
bad_x
##>> Decimal(1000000000.0000009992934598234592348593458)

The reason bad_x becomes the "wrong" value is because it first did regular ython addition on 1000000000 and 1e-6, which ran into the floating point issue, and then took that (wrong) value and passed it to Decimal - the damage has already been done.

For your use case, it looks like you can make the values into Decimals before adding/subtracting, so you should get the desired results without a problem.

understand Numerical Stability big computation loss

2 Answers2