1

I have a calculation using double and run into problems due to catastrophic cancellation (nearly equal values are subtracted). I am looking for a rearrangement to keep the loss of significance at a minimum.

My equation is that of a plane from which I compute z = z(x, y) (with a set of constants z0, x21, ...):

// (z-z0)*(x21*y31 - y21*x31) = (x-x0)*(y21*z31 - z21*y31) + (y-y0)*(z21*x31-x21*z31)

double z = z0 + (x-x0)*(y21*z31 - z21*y31)/(x21*y31 - y21*x31)
              + (y-y0)*(z21*x31-x21*z31)/(x21*y31 - y21*x31);

One set of values for which I run into problems is given below. In this case it is due to y-terms all being close to zero (y21 ~= -0.0032, y31 ~= -0.0097, y-y0 ~= -0.0032):

(z-z0) * 9.3241e-18 ~= 0.1112*0.0830 + (-0.0032)*2.8428
                    ~= 0.009234 - 0.009234
                    ~= 5.2042e-18

The result is barely usable since I suspect both 9.3241e-18 and 5.2042e-18 to be affected by calculation errors.

I am looking for a way to deal with calculation errors due to subtraction in general and a smart rearrangement for the special case of y21, y31, y-y0 being close to zero in particular. The rearrangement I tried have not had success so far. How can I deal with those calculation problems (in C++).

President James K. Polk
  • 40,516
  • 21
  • 95
  • 125
Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
  • Reopened, on closer inspection the dupe might not be what your actually asking for. Leaving the link here for reference in case it is: https://stackoverflow.com/questions/17333/what-is-the-most-effective-way-for-float-and-double-comparison – NathanOliver May 28 '20 at 15:04
  • 2
    Note that generally, with floats, worst case situations happen when you substract or add very different values. For example, you might have (1 + eps) - 1 = 0, when eps is less than 2^(-nb), where nb is the number of bits of the mantissa. Having a number not equal to 0 set to zero because of rounding errors can be worse than getting a very small number instead of zero. In other words, addition is no longer associative – Damien May 28 '20 at 15:12
  • 2
    There are libraries that can do this but to implement it yourself is a lot of work. These libraries do not use floating point but instead hold all values as fractions using multi-precision integers (and integer arithmetic) for both numerator and denominator. Using this approach all operations are performed exactly. An example library is https://www.cgal.org/ internal link https://doc.cgal.org/latest/Number_types/classCGAL_1_1MP__Float.html – Richard Critten May 28 '20 at 15:16
  • I doubt there is a general solution apart from using more precision. However if you were able to describe in higher level terms where these formulae came from, someone might be able to describe a better algorithm to achieve your ends.That is, there may be particular solutions. – dmuir May 30 '20 at 16:35
  • OP said "My equation is that of a plane from which I compute `z = z(x, y)` (with a set of constants `z0, x21, ...`)". This may hint at what OP is really trying to achieve, something like calculating the equation of plane which passes through some points. Not sure exactly what, and this will probably never get clarified. But the formulas look vaguely familiar to me; maybe someone can guess from the formulas what the task really is! – anatolyg May 31 '20 at 12:38

0 Answers0