Convert integer to half-precision floating point format using round-to-even

Question

I already know how to implement conversion to half-precision floating point using truncating (thanks to this answer). But how can I do the same conversion using rounding for nearest representable ? For example, i want 65519 to round to 0x7bff (which is 65504), not to infinity. One more example: in the linked solution 8199 will be represented by 8192, but the nearest representable for 8199 is 8200

UPD: For more example cases: I want to round integers between 32768 and 65519 to a multiple of 32, integers between 16384 and 32768 round to a multiple of 16 and so on. In this solution 8199 will be represented by 8192, but the nearest representable for 8199 is 8200

65519 is an integer. Round to what? Are you just setting the last *N* bits to zero? — tadman, Dec 17 '19 at 21:11
@tadman This is half-precision floating point so OP must mean 65519.0 cannot be represented. — Fiddling Bits, Dec 17 '19 at 21:20
it cannot be represented. For example, i want to round integers between 32768 and 65519 to a multiple of 32, integers between 16384 and 32768 round to a multiple of 16 and so on. Looking for best way to implement this on c — envy grunt, Dec 17 '19 at 21:24

hko · Accepted Answer · 2019-12-18T19:50:39.280

3

You need two pieces to achieve what you want.

1. add rounding before you do the conversion

by adding:

  // round the number if necessary before we do the conversion
  if (manbits > 13)
    absx += (2<<(manbits-13));

  manbits = 0;
  tmp = absx;
  while (tmp)
  {
    tmp >>= 1;
    manbits++;
  }

before you do the conversion.

2. Change the clipping to infinty to > 16

by changing

  if (exp + truncated > 15)

to:

  if (exp + truncated > 16)

I updated the original code https://ideone.com/mWqgSP

edited Dec 18 '19 at 19:50

answered Dec 17 '19 at 21:24

hko

548
2
19

but in this way other values will still be truncated, for example 8199 will be represented by 8192, but the nearest representable for 8199 is 8200 – envy grunt Dec 17 '19 at 21:27
1

My edit should fix your issue. So down vote shouldn't be necessary anymore :) Unless I missed something else, please let me know. – hko Dec 17 '19 at 23:53
1

it doesn't work correct for some test cases (for example 65539 should be represented by infinity). I want to implement precision limitations exactly like they are described on Wiki page (https://en.wikipedia.org/wiki/Half-precision_floating-point_format) – envy grunt Dec 18 '19 at 01:26
Actually just read your title again, it mentions round-to-even. Now I understand what you want. – hko Dec 18 '19 at 01:43
i want round-to-even, i think in this case it will be equal to round to nearest representable. Am i wrong? – envy grunt Dec 18 '19 at 01:45
1

I updated the code again. I think this time it should round-to-even – hko Dec 18 '19 at 02:40
can value (manbits - 13) become negative in this implementation ? Can you explain why it's correct ? – envy grunt Dec 18 '19 at 14:13
there are still problems with representing some integers, trying to find suitable test cases for them – envy grunt Dec 18 '19 at 15:20
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/204498/discussion-between-hko-and-envy-grunt). – hko Dec 18 '19 at 17:58

Convert integer to half-precision floating point format using round-to-even

1 Answers1