1

I'm taking few courses about machine learning and I trying to understand this computational issue:

variable = 1000000000 #1 billion
for i in xrange(1000000):
       variable = variable+ 1e-6 #0.000001

variable = variable -1000000000 #1 subtracts with 1 billion again
variable
#>> 0.95367431640625

should be 1 but turns out to 0.95367431640625

Can some one tell me why this happens ?

Alvaro Silvino
  • 9,441
  • 12
  • 52
  • 80
  • 1
    Seems like the typical floating point issue - computers can't truly represent those numbers through bits so it gets "close" but you get some amount of error and 15 or 16 significant digits (9 or 10 before the decimal, 6 after) sounds about in line with `float`s from a C perspective (I'm talking generally and roughly; I don't know specifics otherwise I would've posted as an answer, not a comment). – dwanderson Mar 10 '16 at 14:37
  • Also, why is this tagged with `performancecounter`? This seems not to have anything to do with `performance` but rather with accuracy. – dwanderson Mar 10 '16 at 14:37
  • 1
    Some details to prevent this problem can be found [here](http://stackoverflow.com/questions/11522933/python-floating-point-arbitrary-precision-available). – Rens van der Heijden Mar 10 '16 at 14:41
  • @dwanderson fixed the tagging – Alvaro Silvino Mar 10 '16 at 14:42

2 Answers2

4

You are losing precision. This is because float in Python implementing double floating point precision which only guarantee precision up to the 15/16-th digit.

When you do:

1,000,000,000 + 0.000001
1,000,000,000.000001 + 0.000001
# and so on, note that you are adding the 16-th digit
# but 1,000,000,000.000001 is not actually exactly 1,000,000,000.000001
# behind is something like 1,000,000,000.000001014819 or 1,000,000,000.000000999819

Continuously, you are breaking the precision limit, there are some other values after the last 1 in the 0.000001 which is represented only as 0.000001. Thus you got accumulative error.

Things would have been different if, say, you initialize your variable as 0. This is because in the computation:

0.000000 + 0.000001
0.000001 + 0.000001
0.000002 + 0.000001
#and so on

Although the actual value of 0.000001 isn't exactly 0.000001, but the 16-th digit imprecision is far from the significant numbers:

0.000000 + 0.00000100000000000000011111
0.000001 + 0.00000100000000000000011111 #insignificant error

You could also avoid the error by using decimal value instead of double:

from decimal import *
variable = Decimal(1000000000)
addition = Decimal(1e-6)
for i in xrange(1000000):
   variable = variable+ addition #0.000001

variable = variable-Decimal(1000000000) #1 subtracts with 1 billion again
variable
Ian
  • 30,182
  • 19
  • 69
  • 107
1

Python math can't natively handle arbitrary precision. If you want more precise results, it looks like you need to work with decimal module, and even then, be careful:

from decimal import *
x = Decimal(1000000000)
y = Decimal(1e-6)
z = x+y
z
##>> Decimal(1000000000.00000100000000000)
w = z-x
w
##>> Decimal(0.000001000000000000)

## however, when I tried:
bad_x = Decimal(1000000000 + 1e-6)
bad_x
##>> Decimal(1000000000.0000009992934598234592348593458)

The reason bad_x becomes the "wrong" value is because it first did regular ython addition on 1000000000 and 1e-6, which ran into the floating point issue, and then took that (wrong) value and passed it to Decimal - the damage has already been done.

For your use case, it looks like you can make the values into Decimals before adding/subtracting, so you should get the desired results without a problem.

dwanderson
  • 2,775
  • 2
  • 25
  • 40