Very slow std::pow() for bases very close to 1

Question

I have a numerical code that solves an equation f(x) = 0, in which I have to raise x to a power p. I solve it using a bunch of things, but in the end I have Newton's method. The solution happens to be equal to x = 1, and thus is the cause of my problems. When the iterated solution gets close to 1, say x = 1 + 1e-13, the time necessary to compute std::pow(x, p) grows tremendously, easily by a factor of 100, making my code unusable.

The machine running this thing is AMD64 (Opteron 6172) on CentOS, and the command is simple y = std::pow(x, p);. Similar behavior shows up on all my machines, all x64. As documented here, this is not only my problem (i.e., someone else is pissed too), appears only on x64 and only for x close to 1.0. Similar thing happens to exp.

Solving this problem is vital to me. Does anyone know if there is any way of going around this slowness?

EDIT: John pointed out that this is due to denormals. The question is then, how to fix this? The code is C++, compiled with g++ for use within GNU Octave. It appears that, although I have set CXXFLAGS to include -mtune=native and -ffast-math, that is not helping and code runs just as slowly.

PSEUDO-SOLUTION FOR NOW: To all who care about this problem, the solutions suggested below didn't work for me personally. I really need the usual speed of std::pow(), but without the sluggishness around x = 1. The solution for me personally is to use the following hack:

inline double mpow(double x, double p) __attribute__ ((const));

inline double mpow(double x, double p)
{
    double y(x - 1.0);
    return (std::abs(y) > 1e-4) ? (std::pow(x, p)) : (1.0 + p * y * (1.0 + (p - 1.0) * y * (0.5 + (1.0 / 6.0) * (p - 2.0) * y)));
}

The bound could be changed, but for -40 < p < 40 the error is smaller than about 1e-11, which is good enough. Overhead is minimal from what I found, thus that solves the issue for me.

This could be related to general performance problems with [subnormal numbers](http://en.wikipedia.org/wiki/Denormal_number). Computations with floating point values very close to 0 can be 100x slower than normal. See http://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x. — John Kugelman, Feb 04 '13 at 13:29
Good point. Any suggestion as to how to solve this? Fix the numbers to exactly 1 if they are close enough? — fledgling Cxx user, Feb 04 '13 at 13:32
@JohnKugelman: If you read the link, this is because glibc uses a much slower function (named `__slowpow`) when given certain input values. — interjay, Feb 04 '13 at 13:33
See http://stackoverflow.com/questions/9272155/replacing-extrordinarily-slow-pow-function — ecatmur, Feb 04 '13 at 13:43
@fledglingCxxuser `-ffast-math` breaks IEEE754 compliance. Whether this is really bad or okay depends on your use case, but if I were you I would do some further research to understand what it is doing before enabling that flag. — us2012, Feb 04 '13 at 13:48
I guess all I want is flush-to-zero. The code is compiled using GCC (well, g++) for use inside Octave; any idea how to force it to do FTZ? Adding -ffast-math did not actually help I am afraid. Using -O3 -march=native -mtune=native didn't do much either. — fledgling Cxx user, Feb 04 '13 at 13:52

ecatmur · Answer 1 · 2013-02-04T14:12:27.943

The obvious workaround is to note that in the reals, a ** b == exp(log(a) * b) and use that form instead. You'll need to check that it doesn't adversely affect the accuracy of your results. Edit: as discussed, this also suffers from the slowdown to almost as great a degree.

The problem isn't denormals, at least not directly; trying to compute exp(-2.4980018054066093e-15) suffers the same slowdown, and -2.4980018054066093e-15 is certainly not denormal.

If you don't care about the accuracy of your results, then scaling either the exponend or the exponent should get you outside the slow zone:

sqrt(pow(a, b * 2))
pow(a * 2, b) / pow(2, b)
...

This bug is known to glibc maintainers: http://sourceware.org/bugzilla/show_bug.cgi?id=13932 - if you are looking for a fix as opposed to a workaround you'd want to commission a floating-point math expert with open source experience.

Scaling `x` or `p` did not help in my tests. Fixing the issue in glibc won't help, because this thing has to run on Mac OS and MATLAB, which uses an ancient GCC to compile its MEX files. — fledgling Cxx user, Feb 04 '13 at 14:15

score 1 · Answer 2 · answered Oct 16 '15 at 03:26

1

64-bit Linux?

Use the pow() code from FreeBSD.

Linux C library (glibc) has horrible worst-case performance for certain inputs.

See: http://entropymine.com/imageworsener/slowpow/

answered Oct 16 '15 at 03:26

Kirn Gill II

90
1
6

score 0 · Answer 3 · answered Feb 04 '13 at 13:26

0

It could be your algorithm, too. Perhaps switching to something like BFGS instead of Newton's method will help.

You don't say anything about your convergence criteria. Maybe those need adjusting as well.

answered Feb 04 '13 at 13:26

duffymo

305,152
44
369
561

He's not implementing `pow`, but using the standard library implementation. :) – jalf Feb 04 '13 at 13:28
1

No, this is actually precisely the issue. I have timed the code and tried everything until I nailed down the cause. – fledgling Cxx user Feb 04 '13 at 13:29
I get it. BFGS has to do with the way you formulate the matrix, not necessarily the pow calculation. – duffymo Feb 04 '13 at 13:37

Very slow std::pow() for bases very close to 1

3 Answers3