3

I have some pretty simple function that tries to return a list that is the distance between the inputted list and the average of that list. The code almost works. Any thoughts as to why the results are slightly off?

def distances_from_average(test_list):
    average = [sum(test_list)/float(len(test_list))]*len(test_list)
    return [x-y for x,y in zip(test_list, average)]

Here are my example results: [-4.200000000000003, 35.8, 2.799999999999997, -23.200000000000003, -11.200000000000003] should equal [4.2, -35.8, -2.8, 23.2, 11.2]

martineau
  • 119,623
  • 25
  • 170
  • 301
lmeninato
  • 462
  • 4
  • 12
  • 1
    Because only 53 bits are used for the float accuracy: https://docs.python.org/2/tutorial/floatingpoint.html - have a look at the [`decimal`](https://docs.python.org/2/library/decimal.html) module. – Jan Aug 10 '16 at 19:49
  • welcome to the world of floating point arithmetic with finite precision. – karakfa Aug 10 '16 at 19:49
  • 2
    Consider using [fsum](https://docs.python.org/2/library/math.html#math.fsum) – dawg Aug 10 '16 at 19:55
  • I reopened since the linked duplicate was primarily javascript and did not have an applicable solution to this particular Python series summation question. – dawg Aug 10 '16 at 20:04

2 Answers2

7

This is due to the way computers represent floating point numbers.

They are not always accurate in the way you expect, and thus should not be used to check equality, or represent things like amounts of money.

How are these numbers being used? If you need that kind of accuracy perhaps there are better ways to use the information, like checking for a range instead of checking equality.

Here is some good reading material on the subject

rscarson
  • 262
  • 3
  • 13
  • 1
    It's not that floating point numbers are less accurate than decimal numbers, it's that they are inaccurate in a different way, so rounding errors in floats look strange when represented as decimals. – Paul Aug 10 '16 at 20:16
2

Floating point can leading to surprising results if you do not fully consider how floating point is represented in binary and the rounding errors that can result. Floating point rounding errors are exacerbated in series summation.

Examples:

>>> sum([.1]*10)
0.9999999999999999   # 1.0 expected in decimal
>>> sum([.1]*1000)
99.9999999999986     # 100.0 expected
>>> sum([1, 1e100, 1, -1e100] * 10000)
0.0                  # 20000 expected

There are numerous ways to get results that are exact (using the Decimal module, using the Fractions module, etc) Among other techniques, rounding errors can be eliminated by cancelation during summation. You can use fsum from the Python math library for more exact results with summation:

>>> import math
>>> math.fsum([.1]*10)
1.0
>>> math.fsum([.1]*1000)
100.0
>>> math.fsum([1, 1e100, 1, -1e100] * 10000)
20000.0

The fsum function is based on Raymond Hettinger's Active State recipes. It is not perfect (try math.fsum([1.1,2.2]*1000)...) but it is pretty good.

dawg
  • 98,345
  • 23
  • 131
  • 206