4

I already know how to implement conversion to half-precision floating point using truncating (thanks to this answer). But how can I do the same conversion using rounding for nearest representable ? For example, i want 65519 to round to 0x7bff (which is 65504), not to infinity. One more example: in the linked solution 8199 will be represented by 8192, but the nearest representable for 8199 is 8200

UPD: For more example cases: I want to round integers between 32768 and 65519 to a multiple of 32, integers between 16384 and 32768 round to a multiple of 16 and so on. In this solution 8199 will be represented by 8192, but the nearest representable for 8199 is 8200

envy grunt
  • 232
  • 1
  • 8
  • 65519 is an integer. Round to what? Are you just setting the last *N* bits to zero? – tadman Dec 17 '19 at 21:11
  • 2
    @tadman This is half-precision floating point so OP must mean 65519.0 cannot be represented. – Fiddling Bits Dec 17 '19 at 21:20
  • it cannot be represented. For example, i want to round integers between 32768 and 65519 to a multiple of 32, integers between 16384 and 32768 round to a multiple of 16 and so on. Looking for best way to implement this on c – envy grunt Dec 17 '19 at 21:24
  • @FiddlingBits That'd make more sense, sure. – tadman Dec 17 '19 at 21:36

1 Answers1

3

You need two pieces to achieve what you want.

1. add rounding before you do the conversion

  by adding:

  // round the number if necessary before we do the conversion
  if (manbits > 13)
    absx += (2<<(manbits-13));

  manbits = 0;
  tmp = absx;
  while (tmp)
  {
    tmp >>= 1;
    manbits++;
  }

  before you do the conversion.

2. Change the clipping to infinty to > 16

  by changing

  if (exp + truncated > 15)

  to:

  if (exp + truncated > 16)

I updated the original code https://ideone.com/mWqgSP

hko
  • 548
  • 2
  • 19
  • but in this way other values will still be truncated, for example 8199 will be represented by 8192, but the nearest representable for 8199 is 8200 – envy grunt Dec 17 '19 at 21:27
  • 1
    My edit should fix your issue. So down vote shouldn't be necessary anymore :) Unless I missed something else, please let me know. – hko Dec 17 '19 at 23:53
  • 1
    it doesn't work correct for some test cases (for example 65539 should be represented by infinity). I want to implement precision limitations exactly like they are described on Wiki page (https://en.wikipedia.org/wiki/Half-precision_floating-point_format) – envy grunt Dec 18 '19 at 01:26
  • Actually just read your title again, it mentions round-to-even. Now I understand what you want. – hko Dec 18 '19 at 01:43
  • i want round-to-even, i think in this case it will be equal to round to nearest representable. Am i wrong? – envy grunt Dec 18 '19 at 01:45
  • 1
    I updated the code again. I think this time it should round-to-even – hko Dec 18 '19 at 02:40
  • can value (manbits - 13) become negative in this implementation ? Can you explain why it's correct ? – envy grunt Dec 18 '19 at 14:13
  • there are still problems with representing some integers, trying to find suitable test cases for them – envy grunt Dec 18 '19 at 15:20
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/204498/discussion-between-hko-and-envy-grunt). – hko Dec 18 '19 at 17:58