half precision muliplication seems to produce wrong result

Question

First of all, IEEE754 half-precision floating point number uses 16 bits. It uses 1 bit sign, 5 bits exponent, and 10 bit mantissa. actual value can be calculated to be sign * 2^(exponent-15) * (1+mantisa/1024).
I'm trying to run a image detection program using half precision. The original program is using single precision (=float). I'm using the half precision class in http://half.sourceforge.net/. Using the class half, I can run the same program at least.(by using half instead of float and compiling with g++ instead of gcc, and after many many type castings..)
I found a problem where multiplication seems to be wrong.

here is the sample code to see the problem (To print half precision number, I should cast it to float to see the value. and automatic casting doesn't take place in operations of half and integer so I put some castings..) :

#include <stdio.h>
#include "half.h"
using half_float::half;
typedef half Dtype;

main()
{
#if 0 // method 0 : this makes sx 600, which is wrong.

int c = 325;
Dtype w_scale = (Dtype)1.847656;
Dtype sx = Dtype(c*w_scale);
printf("sx = %f\n", (float)sx);  // <== shows 600.000 which is wrong.

#else  // method 1, which also produces wrong result..

int c = 325;
Dtype w_scale = (Dtype)1.847656;
Dtype sx = (Dtype)((Dtype)c*w_scale);
printf("sx = %f\n", (float)sx);
printf("w_scale specified as 1.847656 was 0x%x\n", *(unsigned short *)&w_scale);

#endif
}

The result looks like this :

w_scale = 0x3f63
sx = 600
sx = 0x60b0

But the sx should be 325 * 1.847656 = 600.4882. What can be wrong?

ADD : When I first posted this question, I didn't expect the value to be exactly 600.4882 but somewhere close to it. I later found the half precision, with its limitation of expressing only 3~4 effective digits, the closest value of the multication just turned out to be just 600.00. Though everybody knows floating point has this kind of limitations, some people will make a mistake like me by overlooking the fact that half-precision can have only 3~4 effective digits. So I think this question is worth a look-at by future askers. (In stackoverflow, I think some people just take every questions as the same old question when it's actually a slightly different cases. ANd it doesn't harm to have a couple of similar questions.)

*"compiling with g++ instead of gcc"* that's because you are using a C++ header only library. Also your `main` definition is wrong, it's supposed to be `int main`. And it produces wrong output because you are casting to `float` - try using `std::cout` as described in the examples on the website of the library, or use `half_cast` (also provided by the library) — UnholySheep, Jul 25 '17 at 06:20
With all due respect, you should learn programming first, before doing research on image detection or floating point numbers. You can't produce reliable results without basic programming skills and basic knowledge in your tools. — , Jul 25 '17 at 06:27
@NickyC I forgot int main() return 0; when I posted the question. I've sometimes worked with C++ codes but these days with C. and I don't think I should 'learn' programming again. It's just I switch between different languages during work , C, C++, Python, verilog, VHDL, etc.. And because the original source is in C, I'm trying to maintain the code as it is as possible. It's a huge C program with many files. — Chan Kim, Jul 25 '17 at 06:36
@UnholySheep Hi, when I put `cout << "sx = " << half_cast(sx) << endl;` I got error: no matching function for call to 'half_cast(Dtype&)'. What should I do? I tried using detail; using namespace deail;, etc in vain. Sorry I haven't look into details of half.h and am not so good in C++. — Chan Kim, Jul 25 '17 at 06:41
Please (learn to) read the documentation of libraries you are trying to use. Just randomly writing code and hoping it will work is **not** a good way to create software. — UnholySheep, Jul 25 '17 at 06:43
I had only a couple of days to finish it and hoped somebody could help me. maybe I'll have to look into it. — Chan Kim, Jul 25 '17 at 07:05
Why do you think the result is wrong? What did you expect it to be? (A ten-bit mantissa gives you very limited precision, and w_scale is somewhat less than 1.847656. You're also rounding the output.) — molbdnilo, Jul 25 '17 at 07:09
Why do you `half_cast` your `half`? Have you even read the very first example on that website? Reading document is arguably a part of programming too. You may need to learn programming again in some sense. — , Jul 25 '17 at 07:21
I put an answer. there was no problem in printing. It was just the half-precision limitation to express the number I wanted. — Chan Kim, Jul 25 '17 at 07:30
Possible duplicate of [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) — , Jul 25 '17 at 12:34

Chan Kim · Accepted Answer · 2017-07-25T14:18:38.920

I figured it out why. The half-precision has an effective precision of approx log10(2^10) ~ 3 or 4 digits. I wanted the sx to be printed as 600.488 or something close but this cannot be represented using half-precision. This part came during the image preprocessing that can be done without 16 bit precision (our tentative hardware), so I can just use float operation for this stage.
ADD : this anomaly came during image dimension calculation, and we don't have any reason to use 16 bit float for this case. Just image data (pixel, or feature map data) should use 16 bit float. Having written this, it's a general rule.

half precision muliplication seems to produce wrong result

1 Answers1