0

I need to use the min_value of float16 in my program, but don't want to explicitly writing it out in decimal format. I want to know how to represents it in hex format.

float FP16_MIN = 5.96e-8;

Based on the top answer I received, the hex code for fp16 min with denorm is 0001.

I want a function to do:

float min = fp16_min(0x1); 

I found a similar function in line 185 of https://eigen.tuxfamily.org/dox/Half_8h_source.html, but I didn't understand the implementation.

Zack
  • 1,205
  • 2
  • 14
  • 38

1 Answers1

2

For FP16, the minimum positive normal value is:

                  1       0
                  5 43210 9876543210
                  S -E5-- ---F10----
          Binary: 0 00001 0000000000
             Hex: 0400
       Precision: HP
            Sign: Positive
        Exponent: -14 (Stored: 1, Bias: 15)
       Hex-float: +0x1p-14
           Value: +6.1035156e-5 (NORMAL)

The minimum positive subnormal value is:

                  1       0
                  5 43210 9876543210
                  S -E5-- ---F10----
          Binary: 0 00000 0000000001
             Hex: 0001
       Precision: HP
            Sign: Positive
        Exponent: -14 (Stored: 0, Bias: 14)
       Hex-float: +0x1p-24
           Value: +5.9604645e-8 (DENORMAL)

You can write the former as 0x1p-14 and the latter as 0x1p-24 in your program.

If you want to convert from the underlying hexadecimal representation, then a common trick is to use a union in C and a memcpy in C++. See this answer for details: How is 1 encoded in C/C++ as a float (assuming IEEE 754 single precision representation)?

Of course, to do this properly, you'd need an underlying 16-bit float type; which is typically not available. So, you'll have to first figure out what the corresponding hexadecimal would be in the 32-bit single precision format. For 1p-24 that's easy to compute in single precision:

                  3  2          1         0
                  1 09876543 21098765432109876543210
                  S ---E8--- ----------F23----------
          Binary: 0 01100111 00000000000000000000000
             Hex: 3380 0000
       Precision: SP
            Sign: Positive
        Exponent: -24 (Stored: 103, Bias: 127)
       Hex-float: +0x1p-24
           Value: +5.9604645e-8 (NORMAL)

So the corresponding representation as a single precision float would be 0x33800000. (This is not hard to see: the bias for 32-bit float is 127, so you'd just put 103 in the exponent to get -24. I trust you can do that easily yourself; if not ask away.)

Now you can write:

#include <inttypes.h>
#include <iostream>

int main(void) {
    uint32_t abc = 0x33800000;
    float i;
    std::memcpy(&i, &abc, 4);
    std::cout<< i << std::endl;
    return 0;
}

which prints:

5.96046e-08
alias
  • 28,120
  • 2
  • 23
  • 40
  • Could you take a look at my updated question and give me some suggestions? – Zack Jun 24 '19 at 16:33
  • 1
    Sure; see the link I put in. – alias Jun 24 '19 at 17:25
  • Thanks a lot for your help. I tried: uint32_t abc = 0x1; float i; std::memcpy(&i, &abc, 2); std::cout<< i << std::endl; but the result doesn't look right. I got 1.4013e-45 – Zack Jun 24 '19 at 17:56
  • 1
    Since you're using a 32-bit float to do the conversion, you'd have to first figure out the corresponding hexadecimal representation for that. See the updated explanation. – alias Jun 24 '19 at 18:21
  • Thanks a lot for your explanation. Is there anyway to store 0x1 and convert it to 5.96e08? I really want to have similar code as line 185 of https://eigen.tuxfamily.org/dox/Half_8h_source.html. – Zack Jun 25 '19 at 15:46
  • Not unless you use a 16-bit FP implementation; which that library is doing. Why not just use that: They implemented it for you right there. – alias Jun 25 '19 at 15:57
  • I see. I am not allowed to add a dependence of that library in my project. Really really appreciate your time, help, and patience. – Zack Jun 25 '19 at 16:37
  • 1
    You can probably get away by implementing a minimal subset of the library, only dealing with conversions and no arithmetic. But (i) I doubt you can find that code freely anywhere, it's quite a specialized thing. (ii) while it wouldn't be too hard to write, it also isn't necessarily trivial either. There are many cases to consider: normals, denormals, NaN, infinities, etc. But definitely doable if you invest in the time studying the IEEE754 format. The wikipedia article is actually pretty decent on the exact format: https://en.wikipedia.org/wiki/IEEE_754#Basic_and_interchange_formats – alias Jun 25 '19 at 17:10