2

I'm trying to understand IEEE 754 floating point addition at a binary level. I have followed some example algorithms that I have found online, and a good number of test cases match against a proven software implementation. My algorithm is only dealing with positive numbers at the moment. However, I am not getting a match with this test case:

00001000111100110110010010011100 (1.46487e-33)
00000000000011000111111010000100 (1.14741e-39)

I split it up into sign bit, exponent, mantissa. I add back in the implicit 1 to the mantissa

0 00010001 1.11100110110010010011100
0 00000000 1.00011000111111010000100

I subtract the larger exponent from the smaller in order to determine the realignment-shift amount:

 00010001 (17)
-00000000 (0)
 =============
           17

I tack on a Guard bit, Round Bit, and Sticky Bit to the mantissas:

1.11100110110010010011100 000
1.00011000111111010000100 000

I shift the lesser value's mantissa to the right 17 times, with the LSb "sticking" once it receives a 1:

0.00000000000000001000110 001

I add the greater mantissa to the shifted lesser mantissa:

1.11100110110010010011100 000 +
0.00000000000000001000110 001
================================
1.11100110110010011100010 001

Since there was no overflow, and the guard bit is 0, I can use the summation-mantissa and greater-exponent directly (re-removing the implicit '1'):

0 00010001 11100110110010011100010

Giving a final value of:

00001000111100110110010011100010 (1.46487e-33)

But according to my verification implementation, I should be getting:

00001000111100110110010010101000 (1.46487e-33)

So very close but not exact. Is there a mistake in my algorithm?

user2913869
  • 481
  • 2
  • 5
  • 23
  • 4
    Zero exponent means subnormal number. There is no implicit one bit. – Patricia Shanahan Aug 02 '18 at 20:08
  • The subnormal error accounts for a one bit difference in the final result, 00001000111100110110010010100010. It does not explain the different location of the least significant one bit in the two answers. – Patricia Shanahan Aug 03 '18 at 02:54

1 Answers1

2

There appear to be two problems in the calculation, both related to treating a subnormal number as though it were normal:

  1. Incorrect shift calculation. The exponent is -126, not -127.
  2. Incorrectly inserting a one bit before the binary point.

Here is the revised calculation:

0 00010001 1.11100110110010010011100
0 00000000 0.00011000111111010000100

Tack on a Guard bit, Round Bit, and Sticky Bit to the mantissas:

1.11100110110010010011100 000
0.00011000111111010000100 000

16 bit right shift of smaller number.

0.00000000000000000001100 001

Add the greater mantissa to the shifted lesser mantissa:

1.11100110110010010011100 000 +
0.00000000000000000001100 001
================================
1.11100110110010010101000 001
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
Patricia Shanahan
  • 25,849
  • 4
  • 38
  • 75