I learned from the book Computer Systems: A Programmer's Perspective that the IEEE standard requires the double precision floating number to be represented using the following 64-bit binary format:
- s: 1 bit for sign
- exp: 11 bits for exponent
- frac: 52 bits for fraction
The +infinity is represented as a special value with the following pattern:
- s = 0
- all exp bits are 1
- all fraction bits are 0
And I think the full 64-bit for double should be in the following order:
(s)(exp)(frac)
So I write the following C code to verify it:
//Check the infinity
double x1 = (double)0x7ff0000000000000; // This should be the +infinity
double x2 = (double)0x7ff0000000000001; // Note the extra ending 1, x2 should be NaN
printf("\nx1 = %f, x2 = %f sizeof(double) = %d", x1,x2, sizeof(x2));
if (x1 == x2)
printf("\nx1 == x2");
else
printf("\nx1 != x2");
But the result is:
x1 = 9218868437227405300.000000, x2 = 9218868437227405300.000000 sizeof(double) = 8
x1 == x2
Why is the number a valid number rather than some infinity error?
Why x1==x2?
(I am using the MinGW GCC compiler.)
ADD 1
I modified the code as below and the validated the Infinity and NaN successfully.
//Check the infinity and NaN
unsigned long long x1 = 0x7ff0000000000000ULL; // +infinity as double
unsigned long long x2 = 0xfff0000000000000ULL; // -infinity as double
unsigned long long x3 = 0x7ff0000000000001ULL; // NaN as double
double y1 =* ((double *)(&x1));
double y2 =* ((double *)(&x2));
double y3 =* ((double *)(&x3));
printf("\nsizeof(long long) = %d", sizeof(x1));
printf("\nx1 = %f, x2 = %f, x3 = %f", x1, x2, x3); // %f is good enough for output
printf("\ny1 = %f, y2 = %f, y3 = %f", y1, y2, y3);
The result is:
sizeof(long long) = 8
x1 = 1.#INF00, x2 = -1.#INF00, x3 = 1.#SNAN0
y1 = 1.#INF00, y2 = -1.#INF00, y3 = 1.#QNAN0
The detailed output looks a bit strange, but I think the point is clear.
PS.: It seems the pointer conversion is not necessary. Just use %f
to tell the printf
function to interpret the unsigned long long
variable in double
format.
ADD 2
Out of curiosity, I checked the bit represetation of the variables with the following code.
typedef unsigned char *byte_pointer;
void show_bytes(byte_pointer start, int len)
{
int i;
for (i = len-1; i>=0; i--)
{
printf("%.2x", start[i]);
}
printf("\n");
}
And I tried the code below:
//check the infinity and NaN
unsigned long long x1 = 0x7ff0000000000000ULL; // +infinity as double
unsigned long long x2 = 0xfff0000000000000ULL; // -infinity as double
unsigned long long x3 = 0x7ff0000000000001ULL; // NaN as double
double y1 =* ((double *)(&x1));
double y2 =* ((double *)(&x2));
double y3 = *((double *)(&x3));
unsigned long long x4 = x1 + x2; // I want to check (+infinity)+(-infinity)
double y4 = y1 + y2; // I want to check (+infinity)+(-infinity)
printf("\nx1: ");
show_bytes((byte_pointer)&x1, sizeof(x1));
printf("\nx2: ");
show_bytes((byte_pointer)&x2, sizeof(x2));
printf("\nx3: ");
show_bytes((byte_pointer)&x3, sizeof(x3));
printf("\nx4: ");
show_bytes((byte_pointer)&x4, sizeof(x4));
printf("\ny1: ");
show_bytes((byte_pointer)&y1, sizeof(y1));
printf("\ny2: ");
show_bytes((byte_pointer)&y2, sizeof(y2));
printf("\ny3: ");
show_bytes((byte_pointer)&y3, sizeof(y3));
printf("\ny4: ");
show_bytes((byte_pointer)&y4, sizeof(y4));
The output is:
x1: 7ff0000000000000
x2: fff0000000000000
x3: 7ff0000000000001
x4: 7fe0000000000000
y1: 7ff0000000000000
y2: fff0000000000000
y3: 7ff8000000000001
y4: fff8000000000000 // <== Different with x4
The strange part is, though x1 and x2 have the identical bit pattern as y1 and y2, the sum x4 is different from y4.
And
printf("\ny4=%f", y4);
gives this:
y4=-1.#IND00 // What does it mean???
Why are they different? And how is y4 obtained?