8

Supposing I want to divide one number into many.

a /= x;
b /= x;
c /= x;
...

Since multiplication is faster, the temptation is to do this

tmp = 1.0f / x;
a *= tmp;
b *= tmp;
c *= tmp;
...

1) Is this guaranteed to produce identical answers? I suspect not but some confirmation would be nice.

2) If x is extremely large or extremely small I expect this could cause a significant loss of accuracy. Is there a formula which will tell me how much accuracy I will sacrifice?

3) Perhaps there's no convenient formula, but can we at least state a rule of thumb for when numerical instabilities will be an issue? Is it to do with the magnitudes of the operands, or the difference between the magnitudes of the operands perhaps?

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
spraff
  • 32,570
  • 22
  • 121
  • 229
  • 1
    Multiplication and division with different exponents does not cause more loss than usual. Addition and subtraction with large difference of exponents, however, do cause great loss or precision. – dmg Feb 02 '15 at 14:47

4 Answers4

5

1) No, not guaranteed to produce identical answers. Even with IEEE, subtle rounding effects may result in a different of a 1 or 2 ULP by using a/x or a*(1/x).

2) If x is extremely small (that is a bit smaller than DBL_MIN (minimum normalized positive floating-point number) as in the case of sub-normals), 1/x is INF with total loss of precision. Potentially significant loss of precision occurs also with x very large like when the FP model does not support sub-normals.
By testing |x| against the largest finite number <= 1/DBL_MIN and the smallest non-zero >= 1/DBL_MAX, code can determine when significant loss of accuracy begins. A formula would likely depend on the FP model used and the exponent of the x as well as the model's limits. In this range of binary64, the difference in the binary exponent of x and Emin (or Emax) would be the first order estmiate of bits lost.

3) Significant numerical instabilities occur in the ranges discussed above.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • 1
    Consider the range of finite `double` numbers and let us assume `x > 0.0`. _Typically_ the maximum finite value `DBL_MAX` and the `DBL_MIN` (minimum normalized positive floating-point number) are appropriately inverses of each other. As they can readily be not exact mathematical inverses of each other, one or the other will have a representable inverse, but not the other. That is where the problem begins. – chux - Reinstate Monica Feb 02 '15 at 16:06
  • Most FP also uses sub-normals and so the range of numbers is `DBL_MAX` and the `DBL_TRUE_MIN` (minimum positive floating-point number) and the inverses of those 2 reveal another location where the problems continues - typically at the high end of the range. – chux - Reinstate Monica Feb 02 '15 at 16:07
  • @spraff BTW: love your picture - did we order at the same store? – chux - Reinstate Monica Feb 02 '15 at 16:20
  • Comments on _representable inverse_ et. al. very interesting. – ryyker Feb 02 '15 at 20:10
2

As you are working with ieee-754 numbers, I would say your approach is perfectly acceptable. In ieee-754 x is roughly

mantisa * 2 exponent

Where mantisa is a number between 1/2 and 1.

So as long as you are only doing multiplications and divisions, you will have of course a loss in precision, but that loss is independent of the magnitude of x (*) and is only related to the precision of floating type used (single, double of quad precision, that means float, double or long double depending on compiler and architecture).

(*) That is only true as long as you will not have overflow of underflow, that is around 1038 for single precision of 10300 for double precision.

Références : pages Floating point and IEEE floating point on wikipedia

Juergen
  • 12,378
  • 7
  • 39
  • 55
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
2

Here are a few thoughts with supporting links:

1) - Yield same results? No guarantees. There are too many contributions to variability, everything from uP design (remember the math co-processor design error of the 486 DX math co-processor?) to compiler implementation, to the way a float is stored in hardware memory. (Good discussion on that here.)

2) - Formula? I am not aware of one. And, what do you mean by significant error? In any case, you can establish expectations for the precision you will see by :

  • Understanding various implementations of floating point numbers (link compares 2)
  • What variable type is used (float, double, long double). (differences)
  • What architecture are you building on a 32 bit, or 64 bit, other?

There are many discussions on floating point error. Here is one

3) There are no real rules of thumb (if by rule of thumb you mean easy to remember, easy to apply, simple to understand) however, here is a good attempt at answering that question with regard to floating point error

Community
  • 1
  • 1
ryyker
  • 22,849
  • 3
  • 43
  • 87
0

My 50 cents on this :

1) No , explained further at 3.

2) I don't know of any formula, so I'll just skip this one.

3) The rule of thumb that I know is to try to only perform operations between operands with close order of magnitudes.

Practical sample :

You want to divide by 1.000.000 the number 63.000.000 .

Using the first approach, you will end up dividing 63*10^6 to 1*10^6, which have very close magnitudes.

However, if you would use the 2nd approach, then

temp = 1.0f / x;

Will produce 10^(-6).

Now, multiplying 63*10^6 * 10^(-6) will lead to significant precision losses, because the difference of magnitude between the two is huge . The CPU will try to use the exponent + fraction representation of the 10^6 number to represent the 10^(-6) number...

A viable alternative would be for temp to be

temp = 1 / 1.000;

And then

a = a * temp * temp ;

Because the magnitude will be closer, there's lesser chance of precission loss.

MichaelCMS
  • 4,703
  • 2
  • 23
  • 29
  • 1
    No, that will not lead to precision losses. The significands are always very close, if the operands are both not denormal. The exponents can easily be adjusted. Only addition and subtraction with greatly differing magnitudes is usually problematic. – Rudy Velthuis Feb 02 '15 at 14:34
  • @Rudy Velthius You are right. I will modify my answer and make the operands denromal. I guess using power of 10 to underline precisssion losses led me to show a case where things actually are quite easily handled . – MichaelCMS Feb 02 '15 at 14:43