-1

Note: I already know that Python uses approximations for floating point numbers. However their behaviour is inconsistent even after a subtraction (which should not increase the representation error, should it?). Furthermore, I would like to know how to fix the problem.


Python sometimes seems to use an approximation for the actual value of a float, as you can see in the 5th and 6th element of the this_spectrum variable in the image. All values were read from a text file and should have only 2 decimals. python_approximation When you print them or use them in a Python calculation, they behave as if they have the intended value:

In: this_spectrum[4] == 1716.31
Out: True

However, after using them in a Cython script which calculates all pairwise distances between elements (i.e. performs a simple subtraction of values) and stores them in alphaMatrix, it seems the Python approximations were used instead of the actual values:

In: alphaMatrix[0][0]
Out: 161.08
In: alphaMatrix[0][0] == 161.08
Out: False
In: alphaMatrix[0][0] == 161.07999999999996
Out: True

Why is this happening, and what would be the proper way to fix this problem? Is this really a Cython problem/bug/feature(?) or is there something else going on?

PDiracDelta
  • 2,348
  • 5
  • 21
  • 43
  • Possible duplicate of [Why does the floating-point value of 4\*0.1 look nice in Python 3 but 3\*0.1 doesn't?](https://stackoverflow.com/questions/39618943/why-does-the-floating-point-value-of-40-1-look-nice-in-python-3-but-30-1-doesn) – Ken Y-N Nov 07 '17 at 09:10
  • Please understand that I am asking more than what is discussed in the possible duplicate you have provided. I understand than Python makes approximations for numbers, but I do not understand why they are 'masked' in the former case (see my post) but 'explicit' in the latter case. – PDiracDelta Nov 07 '17 at 09:17
  • 1
    It's possible that you're mixing floats and doubles in Cython and this is causing your issues, but without any code it's impossible to say. (However I think it's more likely that you haven't really understood floating point representation errors and when/how they occur) – DavidW Nov 07 '17 at 11:03
  • (if all your numbers will always have 2 decimal places then it might be best to multiply them all by 100 and store them as ints instead) – DavidW Nov 07 '17 at 11:07

1 Answers1

2

Your problem is here: “a Cython script which calculates all pairwise distances…”

Essentially every floating-point operation (add, subtract, multiply, divide, using any “elementary” function such as square root, cosine, logarithm, and so on), some rounding error may be added. For the basic arithmetic operations, the result is calculated as if the exact mathematical result were calculated and then rounded to the nearest representable floating-point value. Ideally, the elementary functions would behave this way too, but most implementations have some additional error, because calculating these functions precisely is difficult.

As an example, consider calculating 1/3*18 using decimal floating-point with four digits. 1 and 3 in decimal floating-point have no error; they are representable as 1.000 and 3.000. But their quotient, ⅓, is not representable. When you do the division, the result is .3333. Then, when you multiply .3333 by 18, the exact result is 5.9994. Because our floating-point format has only four digits, we have to round this to some number we can represent. The nearest representable value is 5.999, so that is returned.

When you compare a calculated value to a constant such as 161.08, you are comparing a value with several accumulated rounding errors to a value with only one rounding error (when the decimal numeral “161.08” in the source is converted to binary floating-point, there is a rounding error). So the values are different.

You often will not see this difference when numbers are printed with default precision because only a few digits are shown. This means the full internal value is rounded to a few digits for display, and that rounding conceals the differences. If you print the numbers with more precision, you will see the differences.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Thanks for the info. Now how would people usually solve this problem? using np.around to get rid of erroneous decimals decreases performance by a factor 30 ... – PDiracDelta Nov 08 '17 at 12:21
  • @PDiracDelta: We do not know what the problem is; you have not stated why you are comparing calculated values to certain other values. In order to contemplate a solution, we would need to know the actual problem you are trying to solve, such as what the application is ultimately trying to do. – Eric Postpischil Nov 08 '17 at 16:15