My answer on the main question: "Is there a floating point value of x, for which x-x == 0 is false?" is: at least floating point implementation on Intel processors makes NO arithmetic underflow in "+" and "-" operations and so you will be not able to find an x for which x-x == 0 is false. The same is true for all processors which supports IEEE 754-2008 (see references bellow).
My short answer on another your question: if (x-y == 0) is exactly so safe as if (x == y), so assert(x-x == 0) is OK, because no arithmetic underflow will be produced in x-x or (x-y).
The reason is following. A float/double number will be hold in memory in the form mantissa and binary exponent. In the standard case mantissa is normalized: it is >= 0.5 and < 1. In <float.h>
you can find some constants from IEEE floating point standard. Interesting now for us are only following
#define DBL_MIN 2.2250738585072014e-308 /* min positive value */
#define DBL_MIN_10_EXP (-307) /* min decimal exponent */
#define DBL_MIN_EXP (-1021) /* min binary exponent */
But not everybody knows, that you can have double numbers less then DBL_MIN. If you makes arithmetical operations with numbers under DBL_MIN, this number will be NOT normalized and so you works with this numbers like with integers (operation with mantissa only) without any "round errors".
Remark: I personally try not use words "round errors", because there are no errors in arithmetical computer operations. These operation are only not the same as +,-,* and / operations with the same computer numbers like floating number. There are deterministic operations on the subset of floating point numbers which can be saved in the form (mantissa,exponent) with well defined number of bits for each. Such subset of floats we can named as computer floating number. So the result of classical floating point operation will be projected back to the computer floating number set. Such projecting operation is deterministic, and have a lot of features like if x1 >= x2 then x1*y >= x2*y.
Sorry for the long remark and back to our subject.
To show exactly what we have if we operate with numbers less then DBL_MIN I wrote a small program in C:
#include <stdio.h>
#include <float.h>
#include <math.h>
void DumpDouble(double d)
{
unsigned char *b = (unsigned char *)&d;
int i;
for (i=1; i<=sizeof(d); i++) {
printf ("%02X", b[sizeof(d)-i]);
}
printf ("\n");
}
int main()
{
double x, m, y, z;
int exp;
printf ("DBL_MAX=%.16e\n", DBL_MAX);
printf ("DBL_MAX in binary form: ");
DumpDouble(DBL_MAX);
printf ("DBL_MIN=%.16e\n", DBL_MIN);
printf ("DBL_MIN in binary form: ");
DumpDouble(DBL_MIN);
// Breaks the floating point number x into its binary significand
// (a floating point value between 0.5(included) and 1.0(excluded))
// and an integral exponent for 2
x = DBL_MIN;
m = frexp (x, &exp);
printf ("DBL_MIN has mantissa=%.16e and exponent=%d\n", m, exp);
printf ("mantissa of DBL_MIN in binary form: ");
DumpDouble(m);
// ldexp() returns the resulting floating point value from
// multiplying x (the significand) by 2
// raised to the power of exp (the exponent).
x = ldexp (0.5, DBL_MIN_EXP); // -1021
printf ("the number (x) constructed from mantissa 0.5 and exponent=DBL_MIN_EXP (%d) in binary form: ", DBL_MIN_EXP);
DumpDouble(x);
y = ldexp (0.5000000000000001, DBL_MIN_EXP);
m = frexp (y, &exp);
printf ("the number (y) constructed from mantissa 0.5000000000000001 and exponent=DBL_MIN_EXP (%d) in binary form: ", DBL_MIN_EXP);
DumpDouble(y);
printf ("mantissa of this number saved as double will be displayed by printf(%%.16e) as %.16e and exponent=%d\n", m, exp);
y = ldexp ((1 + DBL_EPSILON)/2, DBL_MIN_EXP);
m = frexp (y, &exp);
printf ("the number (y) constructed from mantissa (1+DBL_EPSILON)/2 and exponent=DBL_MIN_EXP (%d) in binary form: ", DBL_MIN_EXP);
DumpDouble(y);
printf ("mantissa of this number saved as double will be displayed by printf(%%.16e) as %.16e and exponent=%d\n", m, exp);
z = y - x;
m = frexp (z, &exp);
printf ("z=y-x in binary form: ");
DumpDouble(z);
printf ("z will be displayed by printf(%%.16e) as %.16e\n", z);
printf ("z has mantissa=%.16e and exponent=%d\n", m, exp);
if (x == y)
printf ("\"if (x == y)\" say x == y\n");
else
printf ("\"if (x == y)\" say x != y\n");
if ((x-y) == 0)
printf ("\"if ((x-y) == 0)\" say \"(x-y) == 0\"\n");
else
printf ("\"if ((x-y) == 0)\" say \"(x-y) != 0\"\n");
}
This code produced following output:
DBL_MAX=1.7976931348623157e+308
DBL_MAX in binary form: 7FEFFFFFFFFFFFFF
DBL_MIN=2.2250738585072014e-308
DBL_MIN in binary form: 0010000000000000
DBL_MIN has mantissa=5.0000000000000000e-001 and exponent=-1021
mantissa of DBL_MIN in binary form: 3FE0000000000000
the number (x) constructed from mantissa 0.5 and exponent=DBL_MIN_EXP (-1021) in binary form: 0010000000000000
the number (y) constructed from mantissa 0.5000000000000001 and exponent=DBL_MIN_EXP (-1021) in binary form: 0010000000000001
mantissa of this number saved as double will be displayed by printf(%.16e) as 5.0000000000000011e-001 and exponent=-1021
the number (y) constructed from mantissa (1+DBL_EPSILON)/2 and exponent=DBL_MIN_EXP (-1021) in binary form: 0010000000000001
mantissa of this number saved as double will be displayed by printf(%.16e) as 5.0000000000000011e-001 and exponent=-1021
z=y-x in binary form: 0000000000000001
z will be displayed by printf(%.16e) as 4.9406564584124654e-324
z has mantissa=5.0000000000000000e-001 and exponent=-1073
"if (x == y)" say x != y
"if ((x-y) == 0)" say "(x-y) != 0"
So we can see, that if we work with numbers less then DBL_MIN, they will be not normalized (see 0000000000000001
). We are working with these numbers like with integers and without any "errors". Thus if we assign y=x
then if (x-y == 0)
is exactly so safe as if (x == y)
, and assert(x-x == 0)
works OK. In this example, z = 0.5 * 2 ^(-1073) = 1 * 2 ^(-1072). This number is really the smallest number which can we have saved in double. All arithmetical operation with numbers less DBL_MIN works like with integer multiplied with 2 ^(-1072).
So I has no underflow problems on my Windows 7 computer with Intel processor. If somebody have another processor it would be interesting to compare our results.
Have somebody an idea how one can produce arithmetic underflow with - or + operations? My experiments looks like so, that it is impossible.
EDITED: I modified the code a little for better readability of the code and the messages.
ADDED LINKS: My experiments shows, that http://grouper.ieee.org/groups/754/faq.html#underflow is absolutely correct on my Intel Core 2 CPU. The way how it will be calculated produce no underflow in "+" and "-" floating point operations. My results are independent on Strict (/fp:strict) or Precise (/fp:precise) Microsoft Visual C compiler switches (see http://msdn.microsoft.com/en-us/library/e7s85ffb%28VS.80%29.aspx and http://msdn.microsoft.com/en-us/library/Aa289157)
ONE MORE (PROBABLY THE LAST ONE) LINK AND MY FINAL REMARK: I found a good reference http://en.wikipedia.org/wiki/Subnormal_numbers, where is described the same what I written before. Including of denormal numbers or denormalized numbers (now often called subnormal numbers for example in In IEEE 754-2008) follow to following statment:
“Denormal numbers provide the
guarantee that addition and
subtraction of floating-point numbers
never underflows; two nearby
floating-point numbers always have a
representable non-zero difference.
Without gradual underflow, the
subtraction a−b can underflow and
produce zero even though the values
are not equal.”
So all my results must be correct on any processor which supports IEEE 754-2008.