UPDATE : I've now incorporated that into a single unified function that perfectly lines up with the official formula while incorporating the proper exponents for 16- and 32-bit formats, and how the bits are split between sign-bit, exponent bits, and mantissa bits.
inputs can be in # of bits, e.g. 64
, a ratio like "2x"
, or even case-insensitive single letters :
"S"
for 1x single, - "D"
for 2x double,
"Q"
for 4x quadruple, - "O"
for 8x "octuple",
"X"
for 16x he"X", - "T"
for 32x "T"hirty-two,
-— all other inputs, missing, or invalid, defaults to 0.5x half-precision
gcat <( jot 20 | mawk '$!(_=NF)=(_+_)^($_)' ) \
<( jot - -1 8 | mawk '$!NF =(++_+_)^$(_--)"x"' ) |
{m,g}awk '
function _754(__,_,___) {
return \
(__=(__==___)*(_+=_+=_^=_<_) ? _--^_++ : ">"<__ ? \
(_+_)*(_*_/(_+_))^index("SDQOXT", toupper(__)) : \
__==(+__ "") ? +__ : _*int(__+__)*_)<(_+_) \
\
? "_ERR_754_INVALID_INPUT_" \
: "IEEE-754-fp:" (___=__) "-bit:" (_^(_<_)) "_s:"(__=int(\
log((_^--_)^(_+(__=(log(__)/log(--_))-_*_)/_-_)*_^(\
-((++_-__)^(--_<__) ) ) )/log(_))) "_e:" (___-++__) "_m"
}
function round(__,_) {
return \
int((++_+_)^-_+__)
}
function _4xlog2(_) {
return (log(_)/log(_+=_^=_<_))*_*_
}
BEGIN { CONVFMT = OFMT = "%.250g"
}
( $++NF = _754(_=$!__) ) \
( $++NF = "official-expn:" \
+(_=round(_4xlog2(_=_*32^(_~"[0-9.]+[Xx]")))-13) < 11 ? "n/a" :_) |
column -s':' -t | column -t | lgp3 5
.
2 _ERR_754_INVALID_INPUT_ n/a
4 _ERR_754_INVALID_INPUT_ n/a
8 IEEE-754-fp 8-bit 1_s 2_e 5_m n/a
16 IEEE-754-fp 16-bit 1_s 5_e 10_m n/a
32 IEEE-754-fp 32-bit 1_s 8_e 23_m n/a
64 IEEE-754-fp 64-bit 1_s 11_e 52_m 11
128 IEEE-754-fp 128-bit 1_s 15_e 112_m 15
256 IEEE-754-fp 256-bit 1_s 19_e 236_m 19
512 IEEE-754-fp 512-bit 1_s 23_e 488_m 23
1024 IEEE-754-fp 1024-bit 1_s 27_e 996_m 27
2048 IEEE-754-fp 2048-bit 1_s 31_e 2016_m 31
4096 IEEE-754-fp 4096-bit 1_s 35_e 4060_m 35
8192 IEEE-754-fp 8192-bit 1_s 39_e 8152_m 39
16384 IEEE-754-fp 16384-bit 1_s 43_e 16340_m 43
32768 IEEE-754-fp 32768-bit 1_s 47_e 32720_m 47
65536 IEEE-754-fp 65536-bit 1_s 51_e 65484_m 51
131072 IEEE-754-fp 131072-bit 1_s 55_e 131016_m 55
262144 IEEE-754-fp 262144-bit 1_s 59_e 262084_m 59
524288 IEEE-754-fp 524288-bit 1_s 63_e 524224_m 63
1048576 IEEE-754-fp 1048576-bit 1_s 67_e 1048508_m 67
0.5x IEEE-754-fp 16-bit 1_s 5_e 10_m n/a
1x IEEE-754-fp 32-bit 1_s 8_e 23_m n/a
2x IEEE-754-fp 64-bit 1_s 11_e 52_m 11
4x IEEE-754-fp 128-bit 1_s 15_e 112_m 15
8x IEEE-754-fp 256-bit 1_s 19_e 236_m 19
16x IEEE-754-fp 512-bit 1_s 23_e 488_m 23
32x IEEE-754-fp 1024-bit 1_s 27_e 996_m 27
64x IEEE-754-fp 2048-bit 1_s 31_e 2016_m 31
128x IEEE-754-fp 4096-bit 1_s 35_e 4060_m 35
256x IEEE-754-fp 8192-bit 1_s 39_e 8152_m 39
===============================================
Similar to the concept mentioned above, here's an alternative formula (just re-arranging some terms) that will calculate the unsigned integer range of the exponent ([32,256,2048,32768,524288]
, corresponding to [5,8,11,15,19
]-powers-of-2) without needing to call the round function :
uint_range = ( 64 ** ( 1 + (k=log2(bits)-4)/2) )
*
( 2 ** -( (3-k)**(2<k) ) )
(a) x ** y
means x-to-y-power
(b) 2 < k
is a boolean condition that should just return 0 or 1.
The function shall be accurate from 16-bit to 256-bit, at least. Beyond that, this formula yields exponent sizes of
– 512-bit : 23
– 1024-bit : 27
– 2048-bit : 31
– 4096-bit : 35
(beyond-256 may be inaccurate. even 27-bit-wide exponent allows exponents that are +/- 67 million, and over 40-million decimal digits once you calculate 2-to-that-power.)
from there to IEEE 754 exponent is just a matter of log2(uint_range)