0

I've been trying to implement floating point addition.

I believe I understand the principle:

  1. acquire the sign, exponent, and mantissa from each float
  2. compare which float has a higher exponent
  3. right shift the mantissa of the lower exponent float the difference of the exponents
  4. add that mantissa to the mantissa of the higher exponent number. If a carry were to happen (bit 23 would be occupied by the mantissa), increase the exponent of the result by 1.

If a number isn't 0, also take into consideration the hidden 1 that would precede the mantissa when shifting

This does work for strictly positive numbers. But what about negatives?

Does adding 1 and -1 ( 1 + -1) require to do a subtraction? what about (-1 + -1) or (10 + -1)?

How do I take the sign into consideration when working out the algorithm?

zubergu
  • 3,646
  • 3
  • 25
  • 38
armorlord
  • 87
  • 10
  • As this is tagged C: `float a=1.0f, b=-1.0f; float c = a - b;` But I assume you want to do the bit twiddling yourself? :) – Michael Dorgan Jun 01 '15 at 22:27
  • yes I'm interested in implementing the bit "twiddling" :) – armorlord Jun 01 '15 at 22:45
  • It sounds self-evident that `a + -b` is equivalent to `a - b`. You have worked out or found most of the details already, the negative case should not be too much harder! – PJTraill Jun 01 '15 at 22:56

0 Answers0