5

I have a numerical code that solves an equation f(x) = 0, in which I have to raise x to a power p. I solve it using a bunch of things, but in the end I have Newton's method. The solution happens to be equal to x = 1, and thus is the cause of my problems. When the iterated solution gets close to 1, say x = 1 + 1e-13, the time necessary to compute std::pow(x, p) grows tremendously, easily by a factor of 100, making my code unusable.

The machine running this thing is AMD64 (Opteron 6172) on CentOS, and the command is simple y = std::pow(x, p);. Similar behavior shows up on all my machines, all x64. As documented here, this is not only my problem (i.e., someone else is pissed too), appears only on x64 and only for x close to 1.0. Similar thing happens to exp.

Solving this problem is vital to me. Does anyone know if there is any way of going around this slowness?

EDIT: John pointed out that this is due to denormals. The question is then, how to fix this? The code is C++, compiled with g++ for use within GNU Octave. It appears that, although I have set CXXFLAGS to include -mtune=native and -ffast-math, that is not helping and code runs just as slowly.

PSEUDO-SOLUTION FOR NOW: To all who care about this problem, the solutions suggested below didn't work for me personally. I really need the usual speed of std::pow(), but without the sluggishness around x = 1. The solution for me personally is to use the following hack:

inline double mpow(double x, double p) __attribute__ ((const));

inline double mpow(double x, double p)
{
    double y(x - 1.0);
    return (std::abs(y) > 1e-4) ? (std::pow(x, p)) : (1.0 + p * y * (1.0 + (p - 1.0) * y * (0.5 + (1.0 / 6.0) * (p - 2.0) * y)));
}

The bound could be changed, but for -40 < p < 40 the error is smaller than about 1e-11, which is good enough. Overhead is minimal from what I found, thus that solves the issue for me.

  • 5
    This could be related to general performance problems with [subnormal numbers](http://en.wikipedia.org/wiki/Denormal_number). Computations with floating point values very close to 0 can be 100x slower than normal. See http://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x. – John Kugelman Feb 04 '13 at 13:29
  • Good point. Any suggestion as to how to solve this? Fix the numbers to exactly 1 if they are close enough? – fledgling Cxx user Feb 04 '13 at 13:32
  • @JohnKugelman: If you read the link, this is because glibc uses a much slower function (named `__slowpow`) when given certain input values. – interjay Feb 04 '13 at 13:33
  • See http://stackoverflow.com/questions/9272155/replacing-extrordinarily-slow-pow-function – ecatmur Feb 04 '13 at 13:43
  • @fledglingCxxuser `-ffast-math` breaks IEEE754 compliance. Whether this is really bad or okay depends on your use case, but if I were you I would do some further research to understand what it is doing before enabling that flag. – us2012 Feb 04 '13 at 13:48
  • I guess all I want is flush-to-zero. The code is compiled using GCC (well, g++) for use inside Octave; any idea how to force it to do FTZ? Adding -ffast-math did not actually help I am afraid. Using -O3 -march=native -mtune=native didn't do much either. – fledgling Cxx user Feb 04 '13 at 13:52

3 Answers3

9

The obvious workaround is to note that in the reals, a ** b == exp(log(a) * b) and use that form instead. You'll need to check that it doesn't adversely affect the accuracy of your results. Edit: as discussed, this also suffers from the slowdown to almost as great a degree.

The problem isn't denormals, at least not directly; trying to compute exp(-2.4980018054066093e-15) suffers the same slowdown, and -2.4980018054066093e-15 is certainly not denormal.

If you don't care about the accuracy of your results, then scaling either the exponend or the exponent should get you outside the slow zone:

sqrt(pow(a, b * 2))
pow(a * 2, b) / pow(2, b)
...

This bug is known to glibc maintainers: http://sourceware.org/bugzilla/show_bug.cgi?id=13932 - if you are looking for a fix as opposed to a workaround you'd want to commission a floating-point math expert with open source experience.

ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • Scaling `x` or `p` did not help in my tests. Fixing the issue in glibc won't help, because this thing has to run on Mac OS and MATLAB, which uses an ancient GCC to compile its MEX files. – fledgling Cxx user Feb 04 '13 at 14:15
1

64-bit Linux?

Use the pow() code from FreeBSD.

Linux C library (glibc) has horrible worst-case performance for certain inputs.

See: http://entropymine.com/imageworsener/slowpow/

Kirn Gill II
  • 90
  • 1
  • 6
0

It could be your algorithm, too. Perhaps switching to something like BFGS instead of Newton's method will help.

You don't say anything about your convergence criteria. Maybe those need adjusting as well.

duffymo
  • 305,152
  • 44
  • 369
  • 561