I want to judge if two floating-point numbers are equal.
No, you do not. What you are actually trying to do is test whether two real numbers a and b are equal when all you have is two numbers a
and b
, where a
and b
are the results of floating-point operations but a and b are the results of real-number mathematics.
Two floating-point objects compare equal if and only if they represent equal numbers. So, if you were trying to judge whether two floating-point numbers were equal, all that would be necessary is to evaluate a == b
. That evaluates to true if and only if a
and b
are equal. So “comparing floating-point numbers” is easy. But you want to “compare the two real numbers I would have if I were using real-number arithmetic, but I only have floating-point numbers,” and that is not easy.
The normal operation should be fabs (a - b) < DBL_EPSILON,…
No, that is not the normal operation. There is no general solution for comparing floating-point numbers that contain errors from previous operations.. I have written about this previously here, here, and here.
the definition in wiki:
DBL_MIN – minimum normalized positive value of double;
DBL_EPSILON – difference between 1.0 and the next representable value of double.
According to the definition of above, DBL_EPSILON is the minimum precision of double value, why is there DBL_MIN? what is the relationship between DBL_MIN and DBL_EPSILON?
Your question does not state what programming language or implementation you are using, so we do not know precisely what is used for the double
type you are using. However, IEEE-754 64-bit binary floating-point is ubiquitous. In this format, numbers are represented with a sign, a 53-bit significand, and an exponent of two from −1022 to +1023. (The significand is encoding using both a 52-bit field and some information from the exponent field, so many people refer to it as a 52-bit significand, but this is incorrect. Only the primary field for encoding it is 52 bits. The actual significand is 53 bits.) This information about the significand with and exponent range is enough to understand DBL_MIN
and DBL_EPSILON
, so I will not discuss the encoding format much in this answer. However, I will point out there are normal signifcands and subnormal significands. For normal significands, the significand value is given by the binary numeral “1.” followed by 52 bits after the radix point (the 52 bits in the significand field). For subnormal significands, the significand value given by “0.” followed by 52 bits. Normal and subnormal significands are distinguished by the value in the exponent field.
DBL_MIN
is the minimum normal positive value. So it has the smallest normal significand value, given by “1.0000000000000000000000000000000000000000000000000000”, which is 1, and the lowest exponent, −1022. So it is +1•2−1022, which is about 2.2•10−308.
DBL_EPSILON
is the difference between one and the next value representable in the floating-point format. That next value is given by a significand with binary “1.0000000000000000000000000000000000000000000000000001”, which is 1+2−52. So DBL_EPSILON
is 2−52.
Which of these should you use for a tolerance in comparison? Neither. To get a
and b
, presumably you did some floating-point operation. In each of those operations, there may have been some error. Floating-point arithmetic approximates real arithmetic. For each elementary operation, floating-point arithmetic gives you the representable value that is nearest the real-number result. (Usually, this is the nearest in either direction, but directed rounding modes may be available to choose a preferred direction.) When this representable result differs from the real-number result, the difference is called rounding error. In round-to-nearest mode, the rounding error may, in general, be up to 1/2 the distance between representable numbers in that vicinity.
When you do more than one floating-point operation, these rounding errors compound. They may accumulate or happen to cancel. Each error is small relative to the immediate result, but, as that number is used in further calculations, the final result of the calculations may be small, so errors that occurred during the calculations may be large compared to the final result.
Understanding what the final error may be is a difficult problem in general. There is an entire field of study for it, numerical analysis. What this means is there cannot be any general recommendation about what tolerance to use when attempting to compare floating-point numbers the way you want. It requires study particular to each problem. Furthermore, if you figure out that the floating-point results a
and b
might be some distance d
apart even though the real-number results a and b would be equal, that does not mean comparing a
and b
with a tolerance of d
is the right thing to do. That would ensure you get no false negatives—every time a and b are equal, your comparison of a
and b
returns true. However, it would allow you to get false positives—sometimes when a and b are not equal, your comparison of a
and b
returns true.
This is another reason there can be no general advice for comparing floating-point numbers. The first is that the errors that occur are particular to each computation. The second is that eliminating false negatives requires allowing false positives, and whether that is acceptable or not depends on the application. So it cannot be given as general advice.