11

When comparing whether two float in Python, I see code always like this to compare for a small value epsilon, wondering what is the best practices to select the right epsilon value? And what is the scene behind it? Thanks.

epsilon = 0.000001
abs(a - b)<epsilon
Colonel Thirty Two
  • 23,953
  • 8
  • 45
  • 85
Lin Ma
  • 9,739
  • 32
  • 105
  • 175
  • 6
    in python 3.5 this was added: https://docs.python.org/3.5/library/math.html#math.isclose . according to the doc it returns more or less: `abs(a-b) <= max( rel_tol * max(abs(a), abs(b)), abs_tol )` – hiro protagonist Oct 08 '15 at 19:25
  • 1
    Are you looking to get the smallest epsilon, or you think you would rather to change your epsilon dynamically based on your a and b? – user 12321 Oct 08 '15 at 19:27
  • 1
    See http://stackoverflow.com/questions/6837007/comparing-float-double-values-using-operator for one way to do it. The question was for Java, but the answer is universal. – Mark Ransom Oct 08 '15 at 19:31
  • @hiroprotagonist, thanks for the information, what is rel_tol? – Lin Ma Oct 09 '15 at 00:12
  • @MarkRansom, thanks for the reference, my question is more about for Python internally representation for float and double, what are the precision they can keep? If we know this, we can choose a good epsilon value. – Lin Ma Oct 09 '15 at 00:15
  • 1
    Python only has one floating point type `float`, and on every implementation I'm familiar with it's 64 bit IEEE. – Mark Ransom Oct 09 '15 at 00:47
  • @MarkRansom, please feel free to correct me if I am wrong. I think Python should have its own precision boundary for float and why not just use it as epsilon? Thanks. – Lin Ma Oct 09 '15 at 01:24
  • 1
    The precision boundary selected for Python 3.5 is in an answer I left at the other question. I gave you sufficient information to make an informed decision if you feel differently. – Mark Ransom Oct 09 '15 at 01:52
  • 1
    @LinMa `rel_tol` is the relative tolerance. you could want your numers to be within say 1% to qualify as equal - `rel_tol` would check that. but for very small values this would become almost pointless. that's where `abs_tol` decides. – hiro protagonist Oct 09 '15 at 07:24
  • @MarkRansom, do you mean this constant provided by Python? sys.float_info.epsilon? Thanks. – Lin Ma Oct 09 '15 at 21:46
  • @hiroprotagonist, thanks for the information. I am confused why not use Python built-in epsilon directly? See reply from user 12321, and what is the special value of using dynamic precision based on numbers value range? Thanks. – Lin Ma Oct 09 '15 at 21:48
  • 1
    `sys.float_info.epsilon` is the *absolute minimum* difference that is detectable from a value of 1.0. It isn't going to be useful in many contexts. The value I was referring to is `1e-09` that is used by the new `isclose` function. That might be a little too loose depending on your application, but you need to analyze your own situation. – Mark Ransom Oct 09 '15 at 21:53
  • @MarkRansom, why you mentioned "It isn't going to be useful in many contexts"? An example is appreciated. I may have the wrong impression but my understand is sys.float_info.epsilon is universal good for all cases? – Lin Ma Oct 11 '15 at 07:00

4 Answers4

6

There is an assert function in numpy for this purpose, which uses seven decimal precision by default.

from numpy.testing import assert_almost_equal

a = 0.000000001
b = 0.0000000001

>>> assert_almost_equal(a, b)
# Nothing returned.

b = 1
>>> assert_almost_equal(a, b)
AssertionError: 
Arrays are not almost equal to 7 decimals
 ACTUAL: 1e-09
 DESIRED: 1
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • Hi Alexander, very neat, but how to choose the value of decimal in your practices? – Lin Ma Oct 09 '15 at 00:08
  • 2
    It is very subjective and entirely depends on what you're working on. A question like that is a programming question, per se, and isn't really suitable for SO in my opinion. – Alexander Oct 09 '15 at 00:21
  • 1
    ** NOT a programming question – Alexander Oct 09 '15 at 01:07
  • Thanks Alexander, I think Python should have its own precision boundary for float and why not just use it as epsilon? Thanks. – Lin Ma Oct 09 '15 at 01:24
5

if you are looking for the best epsilon ever, to get best comparison you could use python's sys epsilon using:

>>> import sys
>>> sys.float_info.epsilon
2.220446049250313e-16

but if you are more looking to have this epsilon dynamically based on your a and b I would suggest go for:

abs(f1-f2) < tol*max(abs(f1),abs(f2))

or

abs(a-b) <= max( rel_tol * max(abs(a), abs(b)), abs_tol )
user 12321
  • 2,846
  • 1
  • 24
  • 34
  • Thanks user 12321, a bit lost, what is the benefit of dynamically using epsilon? An example is appreciated. – Lin Ma Oct 09 '15 at 00:09
4

The answer is quite complex since you need to know how single or double precision floats are saved (Wikipedia), as a rule of thumb you can use this Table on Wikipedia as reference for choosing epsilon. But there might be some exceptions specially if you don't exactly know if it is float32 or float64 (or for Linux/Mac there are also float96 and float128 around).

But I guess best practise would be to use some predefined function like numpy_assert_array_almost_equal (numpy required).

I guess everyone is handling it somehow different and as long as you can trust your results every method has its pros and cons. And always keep in mind that floats can go totally haywire with the wrong kind of arithmetic operations. i.e. where small differences of big values are being calculated. And in the end the value of epsilon depends on which precision you need and that should be tested there.

wallyk
  • 56,922
  • 16
  • 83
  • 148
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • Thanks MSeifert, I read through this document (https://en.wikipedia.org/wiki/Machine_epsilon#Values_for_standard_hardware_floating_point_arithmetics) and really very informative. I am confused by one thing, in the table, there are two columns called "Machine epsilon", wondering what are their differences? – Lin Ma Oct 09 '15 at 00:05
  • 1
    @LinMa The difference is how you want to define it: The first one is just the second one divided by two. So the first one is propably the one you want because you compare the *absolute* of the difference of two floats. I guess the first one is like a `+/-` error and the second is more like the absolute error on any float. But I'm not exactly sure about that. – MSeifert Oct 09 '15 at 09:09
  • Thansk MSeifert, do you think we could leverage some Python built-in stuff, like sys.float_info.epsilon? – Lin Ma Oct 11 '15 at 07:01
  • 1
    @LinMa Unfortunatly there is not easy way to determine such a epsilon. Because comparing the quality of a result has another epsilon as the initial values. Just suposse you have a variable `A` with floating point error `epsilon1`. If you look at the error of `B` which is `A+A` it's floating point error is now `2*epsilon`. For adding that might be reconstrubtable but suppose you use `A+B*C/D**E % math.exp(F)` what will be the error of this result? – MSeifert Oct 11 '15 at 12:37
  • Thanks MSeifert, what exactly do you mean floating point error? An example is appreciated. And confused for your comments, "A+A it's floating point error is now 2*epsilon", since I think as long as we are using float, the floating point error is fixed as Python is using unified schema for floating points. – Lin Ma Oct 12 '15 at 20:53
  • 1
    Maybe I am wrong but try ``a=0`` then try ``for i in range(1000): a+=0.1`` and ``print(a-100)`` then again the ``for loop`` and ``print(a-200)`` you can try this again and again and at least on my computer the difference is getting bigger. – MSeifert Oct 12 '15 at 23:17
  • Thanks MSeifert, tested code and you are correct. Wondering what does it prove? – Lin Ma Oct 13 '15 at 08:22
  • 1
    It proves that you cannot define an universial ``epsilon`` for arbitary float comparisons because you may or may not know how the float was processed. – MSeifert Oct 13 '15 at 08:57
  • Thanks MSeifert, I think float is using 4-byte always, and why universal epsilon does not work? A bit more details are appreciated. :) – Lin Ma Oct 27 '15 at 23:47
2

what is the best practices to select the right epsilon value?

It depends on the requirements of the application.

If it is planning a Earth-bound trajectory for reentry of a spacecraft I am in, I would choose a very small value, like epsilon = (a+b) * 1e-15.

If it is projecting the U.S. federal deficit (which inherently has great uncertainties), a much larger epsilon is likely suitable: epsilon = (a+b) * 0.002.

wallyk
  • 56,922
  • 16
  • 83
  • 148
  • Thanks wallyk for the samples, and I agree with you different precision is for different cases. My question is more about from Python internal representation, what is the precision boundary it could be. As your example showed, in some cases, you can use 1e-15, but how do I know if Python internally could have such precision for floats or doubles? Your insights are appreciated. Thanks. – Lin Ma Oct 09 '15 at 00:11