How to remove last significant digits/mantissa bits for floating numbers in C++

Question

I would like to remove last 2 or 3 significant digits for floating number in C++ in efficient way. To formulate the question more accurately -- I would like to discard last few mantissa bits in floating number representation.

Some background: I can arrive to the same floating point number using different ways. For example, if I use bilinear interpolation over rectangle with equal values in its corners results will vary in last couple of digits in different points of the rectangle due to machine accuracy limits. The absolute order of these deviations depends on order of interpolated values. For example if p[i]~1e10 (i == 1..4, values at corners of rectangle) then interpolation error caused by machine accuracy is ~1e-4 (for 8-byte floats). If p[i]~1e-10 then error would be ~1e-24. As the interpolated values are used to calculate 1st or 2nd order derivatives I need to 'smooth out' these difference. One idea is to remove last couple of digits off final result. Below is my take on it:

#include <iostream>     // std::cout
#include <limits>       // std::numeric_limits
#include <math.h>       // fabs

template<typename real>
real remove_digits(const real& value, const int num_digits)
{
  //return value;
  if (value == 0.) return 0;
  const real corrector_power =
       (std::numeric_limits<real>::digits10 - num_digits)
                               - round(log10(fabs(value)));

  //the value is too close to the limits of the double minimum value
  //to be corrected -- return 0 instead.
  if (corrector_power > std::numeric_limits<real>::max_exponent10 -
                        num_digits - 1)
  {
    return 0.;
  }
  const real corrector = pow(10, corrector_power);
  const real result =
     (value > 0) ? floor(value * corrector + 0.5) / corrector :
                    ceil(value * corrector - 0.5) / corrector;
  return result;
}//remove_digits

int main() {
  // g++ (Debian 4.7.2-5) 4.7.2 --- Debian GNU/Linux 7.8 (wheezy) 64-bit
  std::cout << remove_digits<float>(12345.1234, 1) << std::endl;  // 12345.1
  std::cout << remove_digits<float>(12345.1234, 2) << std::endl;  // 12345
  std::cout << remove_digits<float>(12345.1234, 3) << std::endl;  // 12350
  std::cout << std::numeric_limits<float>::digits10 << std::endl; // 6
}

It works, but uses 2 expensive operations -- log10 and pow. Is there some smarter way of doing this? As follows from above to achieve my goals I do not need to remove actual decimal digits, but just set 3-4 bits in mantissa representation of floating number to 0.

I would be happy to set to 0 last 4 bits in mantissa representation. — one_two_three, Apr 02 '15 at 11:53
Figure out whether you're dealing with a double, single, or half (if relevant), and mask out the bits. — Ignacio Vazquez-Abrams, Apr 02 '15 at 12:04
Rough idea: Repeatedly calculate `std::nextafter(x, -x)`, XOR the consecutive results and see how the bits change. When you go from `....10000` to `....01111` (5 consecutive bits set), you know you went one too far. — MSalters, Apr 02 '15 at 14:26
possible duplicate of [How to perform a bitwise operation on floating point numbers](http://stackoverflow.com/questions/1723575/how-to-perform-a-bitwise-operation-on-floating-point-numbers) — PeterSW, Apr 02 '15 at 14:27

score 0 · Answer 1 · answered Apr 02 '15 at 14:35

0

"Some background: I can arrive to the same floating point number using different ways."

That's not a problem. The "local" resolution near x is approximately x-std::next_after(x,0.0) which over a wide range is linear in x. Your problem with p[i] varying by 20 orders of magnitude doesn't matter as the relative error varies linearly with p[i], which implies it would also vary linearly with the first expression.

More in general, if you want to compare whether a and b are similar enough, just test whether a/b is approximately 1.00000000. (You have bigger problems when b is exactly zero - there's no meaningful way to say whether 1E-10 is "almost equal" to 0)

answered Apr 02 '15 at 14:35

MSalters

173,980
10
155
350

Thank you, it is interesting suggestion. However, the problem is a bit more complicated then just comparing 2 numbers side-by-side and decide whether they are (almost) equal. E.g. I need to find the location of zero-level isosurface based on values at nodes of numerical grid. If (in 2D) this isosurface is line parallel to the axis then small variation of values at grid nodes will make it non-straight line with some bad consequences. So I really need to have equal values in different grid nodes, if possible. – one_two_three Apr 02 '15 at 15:34
That's setting off all kinds of alarm bells. More technically, what you're describing is strong numerical instability. It also means `d(output)/d(input)` can become very large, much larger than in reality. There's no simple, universal fix. For instance, you may represent a local derivative of a 2D surface as `df/dx, df/dy` which may have numerical issues near zero, but an equally valid way to model the local derivative is the normal vector `{nx, ny, nz}` which by definition is always non-zero. – MSalters Apr 02 '15 at 21:22
You are right, of course, although this discussion goes outside the scope of the question. I simulate the movement of the implicit surface with local speed strongly dependent on local surface geometry (e.g. normal). Therefore there might be a positive feedback for error. Using stable diffusing solvers (Lax Friedrichs for example) usually helps. The issue I described leads to 2D surface (e.g. constant in Y direction) becoming 3D as simulation progresses. Smoothing values in the way I described (by cutting some trailing digits) helped. – one_two_three Apr 04 '15 at 22:56
@one_two_three: That makes me wonder even more. I'm not sure if I understand you correctly, but it appears you are describing some fluid flow model with streams (but there are lots of similar physics). Regardless, the reason why that makes me wonder is that your "removal" operations is physically meaningless because it depends on the physical units. Removing the last bits of a measurement in millimeters is rather nonsensical if we view the same operation in inch, and yet both are equally valid physically. – MSalters Apr 05 '15 at 00:58
I simulate the change in the surface of 3D structure using Level Set method -- the surface is represented implicitly as a (signed) distance from it to points of numerical grid. I see removal of digits as discarding information which is below the sensitivity of the 'measuring tool'. If, say, calculated values of numbers `(x+x)/2` and `(x+x+x)/3` differ in last couple of digits I can assume that correct value of these digits is beyond the achievable floating point resolution (which is virtual equivalent of measuring tool) and can be discarded. – one_two_three Apr 07 '15 at 13:31
@one_two_three: The problem with that method is that fundamentally you're treating `0.999` and `1.001` as quite different (4 digits) but `0.999 and `0.990` as quite alike (one digit). – MSalters Apr 07 '15 at 15:14
Yes, you are correct -- I did not described the requirements properly. In fact in the code I round the value to the given digit, rather than just remove the trailing digits -- expression `floor(value * corrector + 0.5) / corrector` does just that. Moreover, I now realize that just removing last few bits in mantissa will not do the trick -- something more elaborated is required. – one_two_three Apr 07 '15 at 16:00

How to remove last significant digits/mantissa bits for floating numbers in C++

1 Answers1