1

I have some precision error during the conversion from 16 bit half precision floating point format to decimal. It is able to accurately convert certain numbers while at the same time not accurate for others.

The code was originally designed to be for a 32 bit single precision floating point to decimal conversion. Thus, I have tried editing it to fit 16 bit half precision floating point format. As a result, the final value obtained ended up being half the expected value.

Eg. Expected value would be 1800, the result would be 900.

Thus, I decided to add a * 2 to the final operation. I am unsure as to how to fix the current precision error that I have, and why the result is also half the expected value.

Below includes the code I have edited with the respective outcomes.

#include <stdio.h> 
//#include <bits/stdc++.h> 
#include <string>
#include <iostream>
#include <sstream>
#include <math.h>
#include <limits.h>
#include <bitset>
using namespace std;

// Convert the 16-bit binary encoding into hexadecimal
int Binary2Hex( std::string Binary )
{
    std::bitset<16> set(Binary);      
    int hex = set.to_ulong();

    return hex;
}

// Convert the 16-bit binary into the decimal
float GetFloat16( std::string Binary )
{
    int HexNumber = Binary2Hex( Binary );
    printf("Test: %d\n", HexNumber);

    bool negative  = !!(HexNumber & 0x8000);
    int  exponent  =   (HexNumber & 0xf800) >> 10;    
    int sign = negative ? -1 : 1;

    // Subtract 15 from the exponent
    exponent -= 15;

    // Convert the mantissa into decimal using the
    // last 10 bits
    int power = -1;
    float total = 0.0;
    for ( int i = 0; i < 10; i++ )
    {
        int c = Binary[ i + 6 ] - '0';
        total += (float) c * (float) pow( 2.0, power );
        power--;
    }
    total += 1.0;

    float value = sign * (float) pow( 2.0, exponent ) * total * 2;

 }


The 16 bit floating point value that I am using would be: 0101010100010011

Expected Outcome: 81.2 Actual Outcome: 81.1875

Kai
  • 31
  • 4
  • Can `18.2` be exactly represented in 16-bit floating point format? There's no rounding errors because of all the math you do (and remember that rounding errors compound)? – Some programmer dude Aug 05 '19 at 08:40
  • @Someprogrammerdude I have checked with the following link [link](http://weitz.de/ieee/) using the above mentioned floating point value `0101010100010011` and it converts to 81.2 – Kai Aug 05 '19 at 08:56
  • Then I suggest you take your time to first of all break all complex expressions into simpler expressions, preferably doing only a single operation each, and storing that in temporary variables that are then combined for the more complex expressions. When you've done that, use a debugger to step through your code, statement by statement, checking the results of each little sub-expression to make sure it matches what you already have on paper for the calculations. – Some programmer dude Aug 05 '19 at 09:06
  • 1
    Your function `GetFloat16` does return a value because it does not contain a `return` statement. – Eric Postpischil Aug 05 '19 at 11:45
  • 1
    The mask for the exponent field in IEEE-754 binary16 format should be `0x7c00`, not `0xf800`. – Eric Postpischil Aug 05 '19 at 11:48
  • 1
    It is inadvisable to use `pow` for exact exponentiation because some implementations are notoriously bad in that they fail to return correct results when the mathematical result is exactly representable. It may also perform poorly. Usually, the necessary arithmetic for this sort of work can be done with simple multiplication and division. If it cannot, `ldexp` is a better alternative to `pow`. In fact the loop and the use of `pow` is completely unnecessary. The loop may be replaced with `total = (HexNumber & 0x3ff) / 1024.;`. – Eric Postpischil Aug 05 '19 at 11:50
  • 1
    Remove the `*` in `float value = … * total * 2;`. I presume you added that to make the result approximately correct, but that was only necessary due to the correct exponent mask mentioned above. – Eric Postpischil Aug 05 '19 at 11:59
  • 1
    Why would you expect `81.2` from `0101010100010011`? It has a sign of `0`, an exponent of `10101`, which is 21 biased, 6 natural, and a significand of 1.0100010011, which is 1.2685546875, so the number represented is +1 • 2^6 • 1.2685546875 = 81.1875. – Eric Postpischil Aug 05 '19 at 12:01
  • @user207421: Please do not promiscuously close questions as duplicates of [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken). This question concerns errors in the code, not behaviors of floating-point arithmetic that are unexpected to some people. – Eric Postpischil Aug 05 '19 at 12:03
  • @EricPostpischil Thanks for the tips! Will correct these mistakes and see how it goes. To answer some of your questions, yes I added the `* 2` to get a better result as I was unsure of what other fields I needed to edit. I expected `81.2` from `0101010100010011` as the online calculator provided that answer. – Kai Aug 06 '19 at 06:34

0 Answers0