Frame the question this way:
- We have two numbers,
a
and b
, that have been computed with floating-point arithmetic.
- If they had been computed exactly with real-number mathematics, we would have a and b.
- We want to compare
a
and b
and get an answer that tells us whether a equals b.
In other words, you are trying to correct for errors that occurred while computing a
and b
. In general, that is impossible, of course, because we do not know what a and b are. We only have the approximations a
and b
.
The code you propose falls back to another strategy:
- If
a
and b
are close to each other, we will accept that a equals b. (In other words: If a
is close to b
, it is possible that a equals b, and the differences we have are only because of calculation errors, so we will accept that a equals b without further evidence.)
There are two problems with this strategy:
- This strategy will incorrectly accept that a equals b even when it is not true, just because
a
and b
are close.
- We need to decide how close to require
a
and b
to be.
Your code attempts to address the latter: It is establishing some tests about whether a
and b
are close enough. As others have pointed out, it is severely flawed:
- It treats numbers as different if they have different signs, but floating-point arithmetic can cause
a
to be negative even if a is positive, and vice versa.
- It treats numbers as different if they have different exponents, but floating-point arithmetic can cause
a
to have a different exponent from a.
- It treats numbers as different if they differ by more than a fixed number of ULP (units of least precision), but floating-point arithmetic can, in general, cause
a
to differ from a by any amount.
- It assumes an IEEE-754 format and needlessly uses aliasing with behavior not defined by the C++ standard.
The approach is fundamentally flawed because it needlessly fiddles with the floating-point representation. The actual way to determine from a
and b
whether a and b might be equal is to figure out, given a
and b
, what sets of values a and b have and whether there is any value in common in those sets.
In other words, given a
, the value of a might be in some interval, (a
−eal, a
+ear) (that is, all the numbers from a
minus some error on the left to a
plus some error on the right), and, given b
, the value of b might be in some interval, (b
−ebl, b
+ebr). If so, what you want to test is not some floating-point representation properties but whether the two intervals (a
−eal, a
+ear) and (b
−ebl, b
+ebr) overlap.
To do that, you need to know, or at least have bounds on, the errors eal, ear, ebl, and ebr. But those errors are not fixed by the floating-point format. They are not 2 ULP or 1 ULP or any number of ULP scaled by the exponent. They depend on how a
and b
were computed. In general, the errors can range from 0 to infinity, and they can also be NaN.
So, to test whether a and b might be equal, you need to analyze the floating-point arithmetic errors that could have occurred. In general, this is difficult. There is an entire field of mathematics for it, numerical analysis.
If you have computed bounds on the errors, then you can just compare the intervals using ordinary arithmetic. There is no need to take apart the floating-point representation and work with the bits. Just use the normal add, subtract, and comparison operations.
(The problem is actually more complicated than I allowed above. Given a computed value a
, the potential values of a do not always lie in a single interval. They could be an arbitrary set of points.)
As I have written previously, there is no general solution for comparing numbers containing arithmetic errors: 0 1 2 3.
Once you figure out error bounds and write a test that returns true if a and b might be equal, you still have the problem that the test also accepts false negatives: It will return true even in cases where a and b are not equal. In other words, you have just replaced a program that is wrong because it rejects equality even though a and b would be equal with a program that is wrong in other cases because it accepts equality in cases where a and b are not equal. This is another reason there is no general solution: In some applications, accepting as equal numbers that are not equal is okay, at least for some situations. In other applications, that is not okay, and using a test like this will break the program.