I'm using half floats as implemented in the SoftFloat library (read: 100% IEEE 754 compliant), and, for the sake of completeness, I wish to provide my code with definitions equivalent to those available in <float.h> for float
, double
, and long double
.
I know there are different flavours of half floats, but I'm just interested in the standardized one by IEEE 754, known as binary16
.
From my research, and from my tests, I'm confident to define some of the constants as follows:
#define HALF_MANT_DIG 11
#define HALF_DIG 3
#define HALF_DECIMAL_DIG 5
#define HALF_EPSILON UINT16_C(0x1400) /* 0.00097656 */
#define HALF_MIN UINT16_C(0x0400) /* 0.00006103515625 */
#define HALF_MAX UINT16_C(0x7BFF) /* 65504.0 */
NOTE: epsilon, min, and max are defined as the raw hexadecimal representation of the 16bits taken by the type. The proper way of assigning the raw value to the type depends on the half float library used.
However, for the exponent-related definitions, I wasn't able to find consensus. I have taken a look at the Wikipedia page for binary16, at this other SO question, at the Half library, and at several other code in GitHub and other places.
The proposal linked from that other SO question sounds reputable to me, as well as the Half library and the good news is that they match. However, I found disagreement at the FP16.java implementation, at this implementation, at the Zig language implementation, and at Sargon for D.
#define HALF_MIN_EXP The article and Half say (-13) but FP16.java and sargon say (-14)
#define HALF_MAX_EXP The article and Half say 16 but others say 14 or 15
#define HALF_MIN_10_EXP The article and Half say (-4) but sargon says (-5)
#define HALF_MAX_10_EXP The article and Half say 4 but sargon says 5
I'd suppose the article and Half are likely the sources to be right, but, can I know for sure the good values for IEEE 754 binary16?