2

I'm trying to understand what minimum number I need to add to get Infinity because of overflow. I've read this answer already. So let me just clarify my understanding here. To simplify, I'll be working with 1 byte floating point with 4 bits for exponent and 3 bits for mantissa:

0 0000 000

The maximum positive number I can store in it is this:

0 1110 111

which is when converted to scientific notation:

   1.111 x 2^{7} = 11110000

Is my understanding correct that the minimum number I should add to get Infinity is 00010000:

       11110000
+      00010000
        --------
     1 00000000

As I understand anything less than 00010000 will not cause overflow and the result will be rounded to 11110000. But the 00010000 is 0 0000 001 in floating point format, and it's the number 1. So is adding just 1 enough to cause overflow?

Community
  • 1
  • 1
Max Koretskyi
  • 101,079
  • 60
  • 333
  • 488

1 Answers1

2

The answer is given in the other answer to the question you link to. The smallest value which will round to infinity is:

c = 27 × ( 2 − ½ × 21-4 ) = 1.9375 × 27 = 1.11112 × 27

So the smallest value that you can add to get infinity is

cfmax = 1.11112 × 27 − 1.1112 × 27 = 0.00012 × 27 = 23

which, if I understand correctly, would have bit pattern 0 1010 000 in your proposed format.

UPDATE: so why is it this particular cutoff?

Suppose that there was another binade above this one, then the next floating point number would be

x = 1.0002 × 28

Note that c is the value that is exactly halfway between x and fmax. In other words, the values which would round up to x are instead rounded to infinity, but the values which would round down to fmax still round to the same value.

Community
  • 1
  • 1
Simon Byrne
  • 7,694
  • 1
  • 26
  • 50
  • thanks, but the number you calculated `2^3=1000` if added to `1111 0000` will produce `1111 1000`, which doesn't seem to cause overflow, simply the fifth bit is set to 1 and the number will be rounded to max_value. I would expect that overflow to turn entire mantissa to zeros with `1` going over the available bits. Just like for example if you add two 1 byte binary numbers `1111 1111` and `0000 0001` and you get an overflow `1 0000 0000` (9 bits can't be put, so the first 1 is discarded). Is it not so with overflow in floating point? – Max Koretskyi Oct 20 '16 at 05:11
  • what is max_value? – Simon Byrne Oct 20 '16 at 08:34
  • in my 1 byte floating point it's `0 1110 111`. I'm assuming that when floating point adds two numbers, it converts it from scientific from into plain representation, like `0.001 x 2^2 + 0.011 x 2^1 = 0.1+0.11` and adds them like this, then it converts back to scientific form and to floating point representation. Maybe this is where my confusion mentioned in the previous comment comes from – Max Koretskyi Oct 20 '16 at 09:52
  • I think you're trying to think of it in terms of integer operations: floating point overflow is different than integer overflow. If you want to learn how floating point addition is performed, look at something like this http://pages.cs.wisc.edu/~markhill/cs354/Fall2008/notes/flpt.apprec.html – Simon Byrne Oct 20 '16 at 12:06
  • yeah, mostly I'm trying to understand the rationale behind the formula specified in the answer your linked and the phrase 'you need to increase MAX_VALUE enough to actually affect the mantissa' in the answer I linked to. I was thinking that affecting is mantissa is actually causing a integer like overflow. What's your take on `affecting the mantissa`? – Max Koretskyi Oct 20 '16 at 13:11
  • I think that answer is wrong (or, at the very least, unhelpful). Another way to think about it is how it would round if there was another binade above this one. – Simon Byrne Oct 20 '16 at 13:28
  • So you would suggest just remember the formula and not bother why it's such? – Max Koretskyi Oct 20 '16 at 13:46
  • I've updated the answer, hopefully that explains it further. – Simon Byrne Oct 20 '16 at 14:04
  • thanks a lot, I'll go though your explanation and I'll probably get back to you with question :) – Max Koretskyi Oct 20 '16 at 14:48
  • _Suppose that there was another binade above this one_ - can you elaborate a bit? there's not much on _binade_ term on the web – Max Koretskyi Oct 27 '16 at 07:28
  • A binade is a set of all floating point numbers with same exponent, see https://en.wikipedia.org/wiki/Binade – Simon Byrne Oct 27 '16 at 20:32
  • yeah, I've seen that one sentence definition, didn't much help. _another binade above this one_ - what are these two numbers? – Max Koretskyi Oct 28 '16 at 05:48
  • In this case, it's the set of floating point numbers with exponent of 2^8 – Simon Byrne Oct 28 '16 at 12:25