0

I'm trying to understand why offset K in binary offset notation is calculated as 2^{n-1}-1 instead of 2^{n-1} for floating point exponent representation. Here is my reasoning for 2^{n-1}.

Four bits can represent values in the range [-8;7], so 0000 represents -8. An offset from zero here is 8 and can be calculated as 2^{n-1}. Using this offset we can define representation of any number, for example, the number 3.

What number do we need to add to -8 to get 3? It's 11, so 3 in offset binary is represented as 1011. And the formula seems to be number to represent + offset.

However, the real formula is number to represent + offset - 1, and so the correct representation is 1010. Can someone please explain why we also subtract additional one?

Max Koretskyi
  • 101,079
  • 60
  • 333
  • 488
  • It's calculated that way because that's how the floating point format is defined. It's quite arbitrary, but that's how it is defined. – gnasher729 Aug 16 '16 at 18:26
  • @gnasher729, since it's different from the format specified [here](https://en.wikipedia.org/wiki/Offset_binary), I assume they had some reasoning on their own to define it differently, and I'd like to know that reasoning :) – Max Koretskyi Aug 16 '16 at 18:31
  • I don't know if matters, but calculating the exponent this way, the max one has a greater (by one) absolute value then the smaller one. This make bigger numbers representable, while the smaller ones are covered by denormal rappresentation. – Bob__ Aug 16 '16 at 19:02
  • @Bob__, do you mean the range becomes `[-7;8]` instead of `[-8;7]`? Can you please elaborate with examples, maybe in a separate answer? – Max Koretskyi Aug 17 '16 at 05:29

1 Answers1

2

I am posting this as an answer to better explain my thougths, but even though I'll quote the standard a few times, I haven't found an explicitly stated reason.

In the following, I'll refer to the IEEE 754 standard (and succesive revisions) for floating point representation, even if OP doesn't mention it (if I'm wrong, please, let me know).

The question is about the particular representation of the exponent in a floating point number.

In subclause 3.3 Sets of floating-point data is said (emphasis mine):

The set of finite floating-point numbers representable within a particular format is determined by the following integer parameters:

b = the radix, 2 or 10
p = the number of digits in the significand (precision)
emax = the maximum exponent e
emin = the minimum exponent e

emin shall be 1 − emax for all formats.

Later it specifies:

The smallest positive normal floating-point number is bemin and the largest is bemax×(b − b1 − p). The non-zero floating-point numbers for a format with magnitude less than bemin are called subnormal because their magnitudes lie between zero and the smallest normal magnitude.

In 3.4 Binary interchange format encodings:

Representations of floating-point data in the binary interchange formats are encoded in k bits in the following three fields (...):
a) 1-bit sign S
b) w-bit biased exponent E = e + bias
c) (t=p−1)-bit trailing significand field digit string T=d1 d2 ... dp − 1 ; the leading bit of the significand, d0, is implicitly encoded in the biased exponent E
(...)
The range of the encoding’s biased exponent E shall include:
― every integer between 1 and 2w − 2, inclusive, to encode normal numbers
― the reserved value 0 to encode ±0 and subnormal numbers
― the reserved value 2w − 1 to encode ± ∞ and NaNs.

For example a 32-bit floating point number has those parameters:

k, storage width in bits                      32
p, precision in bits                          24
emax, maximum exponent e                     127
emin, minimum exponent e                    -126
bias, E − e                                  127
w, exponent field width in bits                8
t, trailing significand field width in bits   23

In this Q&A is pointed out that: "The purpose of the bias is so that the exponent is stored in unsigned form, making it easier to do comparisons."

Considering the above mentioned 32-bit floating point representation a normal (not subnormal) number has an encoded biased exponent E in the range between 1 and 254.

The reason behind the particular choice of the range -126, 127 for the exponent could be, in my opinion, to extend the range of representable numbers: very low numbers are represented by subnormals so a bigger (even if only by one) maximum exponent can take care of the big ones.

Community
  • 1
  • 1
Bob__
  • 12,361
  • 3
  • 28
  • 42
  • thanks, that's what I thought, we just moved the range from `[-7;8]` to `[-8;7]` for a 4-bit binary number. And it's applicable only to floating point, since usually the offset is calculated as `2^(n-1)` to enable convenient conversion to `two's complement` as explained [here](https://en.wikipedia.org/wiki/Offset_binary) – Max Koretskyi Aug 17 '16 at 11:22
  • @Maximus You're welcome. Please, note that in the very same page you linked it's said _"There is no standard for offset binary, but most often..."_. It surely depends on the final intent of the transformation. – Bob__ Aug 17 '16 at 11:43
  • yes, exactly, the guys from IEEE decided to move the range one number up. I was curious why – Max Koretskyi Aug 17 '16 at 12:13