I am trying to discern whether it is possible to decompose a double precision IEEE floating point value into to two integers and recompose them later with full fidelity. Imagine something like this:
double foo = <inputValue>;
double ipart = 0;
double fpart = modf(foo, &ipart);
int64_t intIPart = ipart;
int64_t intFPart = fpart * <someConstant>;
double bar = ((double)ipart) + ((double)intFPart) / <someConstant>;
assert(foo == bar);
It's logically obvious that any 64-bit quantity can be stored in 128-bits (i.e. just store the literal bits.) The goal here is to decompose the integer part and the fractional part of the double into integer representations (to interface with and API whose storage format I don't control) and get back a bit-exact double when recomposing the two 64-bit integers.
I have a conceptual understanding of IEEE floating point, and I get that doubles are stored base-2. I observe, empirically, that with the above approach, sometimes foo != bar
for even very large values of <someConstant>
. I've been out of school a while, and I can't quite close the loop in my head terms of understanding whether this is possible or not given the different bases (or some other factor).
EDIT:
I guess this was implied/understood in my brain but not captured here: In this situation, I'm guaranteed that the overall magnitude of the double in questions will always be within +/- 2^63 (and > 2^-64). With that understanding, the integer part is guaranteed to fit within a 64-bit int type then my expectation is that with ~16 bits of decimal precision, the fractional part should be easily representable in a 64-bit int type as well.