std::pow very different behavior for different exponents

Question

I am currently trying to optimize some code where 50% of the time is spent in std::pow(). I know that the exponent will always be a positive integer, and the base will always be a double in the interval (0, 1). For fun, I wrote a function:

inline double int_pow(double base, int exponent)
{
    double out = 1.0;
    for(int i = 0; i < exponent; i++)
    {
        out *= base;
    }

    return out;
}

I am compiling with:

> g++ fast-pow.cpp -O3 --std=c++11

I generated 100 million doubles between (0, 1) and compared the timings of (1) std::pow (2) my homemade int_pow function from above and (3) direct multiplication. Here's a sketch of my timing routine (this is a very quickly put-together test):

void time_me(int exp, size_t reps)
{
    volatile double foo = 0.0;
    double base = 0.0;

    size_t i;
    for (i = 0; i < reps; ++i)
    {
        base = ((double) rand() / (RAND_MAX)) + 1;
        foo = pow(base, exp);
        // foo = int_pow(base, exp);
        // foo = base * base * base;
    }

    // check that the loop made it to the end
    std::cout << foo << "  " << i <<  std::endl;
}

int main()
{
    std::clock_t start;

    start = std::clock();
    time_me(3, 1e8);
    std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << std::endl;

    return 0;
}

Here are the timings I've observed for various exponents:

0: std::pow 0.71s, int_pow 0.77s
2: std::pow 1.31s, int_pow 0.80s, direct mult 0.86s
3: std::pow 6.9s (!!), int_pow 0.84s, direct mult 0.76s
5: Similar to 3:

My Questions

So with this, my questions are:

Why does the performance of std::pow appear to degrade so badly for powers greater than 2?
Is there an existing faster power function when the base or exponent types are known ahead of time?
Is there something completely obvious I'm overlooking? I'm about to go through gut std::pow for the cases with known integer exponents, and would hate to have missed something completely trivial.

Thanks!!

Did you examine the machine instructions your compiler emitted to ensure it's not optimizing away your calls, since you don't actually *use* the values returned from your calls to the functions? — Andrew Henle, Jun 27 '16 at 17:50
Can you give us the actual code you used to generate the timings? — NathanOliver, Jun 27 '16 at 17:50
The `pow` function may *cheat* for powers of 2 and use bit shifting. For other powers, it may use its exponential algorithm. — Thomas Matthews, Jun 27 '16 at 17:53
Declare `foo` volatile to prevent optimizing out the assignments. — Barmar, Jun 27 '16 at 18:02
@JesperJuhl The question states `-O3` is used. That answers your question, right? — James Adkison, Jun 27 '16 at 18:07
@MAB -- [Read this](http://stackoverflow.com/questions/25678481/why-pown-2-return-24-when-n-5) before making tests on integer power calculations using `pow`. — PaulMcKenzie, Jun 27 '16 at 18:11
A note: pow and std::pow are designed to do really nasty work like e to the power of pi. If it doesn't have a cheap hack like the one mentioned by @ThomasMatthews above available it'll fall back on doing the work the hard way. "Is there an existing faster power function when the base or exponent types are known ahead of time?" by exponent type do you mean the value? If the value is known at compile time, see here: http://stackoverflow.com/questions/16443682/c-power-of-integer-template-meta-programming — user4581301, Jun 27 '16 at 18:15
I've updated the post to add more information about the timing, I made `foo` volatile to prevent compiler optimizations, and I reran the timing tests and got slightly different results this time (I may have screwed up the first time.) — MAB, Jun 27 '16 at 18:20
@MAB -- The point of the link I gave you is that timing `pow` against your `int_pow` is going to a be a moot point if `pow` can potentially give wrong answers when using integer exponents. — PaulMcKenzie, Jun 27 '16 at 18:36
@PaulMcKenzie Thanks Paul, I just read through your post and it make sense. I've decided I'm going to implement the int_pow function. — MAB, Jun 27 '16 at 18:41
@MAB Personally, I reserve `pow` usage for non-integral / fractional exponents, or if for some reason, the result would overflow the largest int type. Otherwise, either use a template metaprogram, or a lookup table, or write a function that multiplies repeatedly. — PaulMcKenzie, Jun 27 '16 at 18:47
@MAB Since you have positive integers as exponents, you can initialize `out` with `base` and start the loop at `int i = 1`, might save you this tiny bit of time! — philkark, Jun 28 '16 at 08:29
Sorry if you are aware of this, but if you are calling pow in evaluating polynomials Horner's method [link] (https://en.wikipedia.org/wiki/Horner%27s_method) is a far better way to evaluate polynomials — dmuir, Jun 29 '16 at 08:58
`pow` is a general-purpose function. What exponents do you exactly need ? If possible, hard-code the powers where you can, using squarings. And if you need several powers of the same base, consider computing them incrementally. — , Jun 29 '16 at 13:07

score 8 · Accepted Answer · edited May 23 '17 at 11:53

std::pow() is a general purpose function designed to accept any pair of floating point values. It performs expensive computations and should be considered a slow function. However, apparently, a lot of peopled have abused it for squaring, so implementation of pow() in IBM Accurate Mathematical Library (which is used by glibc) was optimized for that particular case:

sysdeps/ieee754/dbl-64/e_pow.c:

double
__ieee754_pow (double x, double y)
{
  ...
  ...
  if (y == 1.0)
    return x;
  if (y == 2.0)
    return x * x;
  if (y == -1.0)
    return 1.0 / x;
  if (y == 0)
    return 1.0;

As you can see, exponent values 0, 1 and -1 are also handled specially, but those, at least, are mathematically significant special cases, whereas squaring is merely a statistically significant case, that shouldn't otherwise deserve special handling). EDIT: Exponent values 0, 1, 2, and -1 are the only ones that allow expressing std::pow(x,n) with (much faster) arithmetic operations without any loss of accuracy. See this answer for more details. Thus exponent value of 2 is not just a statistically significant case. END EDIT

If you want a fast alternative to std::pow() for non-negative integer values of the exponent and don't care about the slight accuracy loss, then

for sufficiently small values of the exponent use your implementation of int_pow();
otherwise, use exponentiation by squaring approach.

The boundary value of the exponent for selecting between the 1st and 2nd methods must be found via careful benchmarking.

I had no idea that GCC optimized that particular case, but that makes sense. I just swapped out a few `std::pow()` calls for `int_pow()` (w/Visual Studio) and noticed an even bigger difference between `std::pow()` and `int_pow()`. Anyways, such a seemingly tiny change has made an ENORMOUS improvement to the performance of my application. Thanks!! — MAB, Jun 27 '16 at 19:02

score 1 · Answer 2 · 2016-06-29T13:20:19.673

1

switch (n)
{
case 0:
  return 1;
case 1:
  return x;
case 8:
  x*= x;
case 4:
  x*= x;
case 2:
  return x * x;
case 6:
  x*= x;
case 3:
  return x * x * x;
case 5:
  y= x * x; return x * y * y;
case 7:
  y= x * x * x; return x * y * y;
...
};

edited Jun 29 '16 at 13:20

answered Jun 29 '16 at 13:14

note that `pow` should be more accurate than this for `n >= 3`. each multiply implies a rounding operation hence upto a bit of error, while a good math library will ensure the whole `pow` operation results in at most one bit of error. – Sam Mason Oct 10 '22 at 18:58
@SamMason: there is no better way to compute large powers than by squarings. This is the method used in math libraries and in coprocessors. – Oct 10 '22 at 20:03
I've not done any exhaustive searches, but `pow(M_PI, 8)` gives me the correctly rounded answer, while 8 multiplies is out at the LSB. presumably there are examples for smaller powers – Sam Mason Oct 10 '22 at 20:21
@SamMason: why eight multiplies ?? – Oct 10 '22 at 20:22
sorry, 8 straight multiplications was what I tried last. but different ways of bundling up the associations causes the same issue, e.g. yours effectively does `((pi*pi)*(pi*pi))*((pi*pi)*(pi*pi))` – Sam Mason Oct 10 '22 at 20:27
@SamMason: yep, it does that in 3 multiplies. The Intel processors perform floating-point operations on 80 bits, which gives them an advantage. But powers are still computed by squarings. [Or tell me how.] – Oct 10 '22 at 20:30
most recently compiled code will tend to use SSE registers instead (e.g. see use of XMM in https://godbolt.org/z/5o8rxPoWE). you can check out the gory details of what [glibc](https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/e_pow.c;h=d29cd5bf5cacc98a4dd6081d2cb18e771c2fccad;hb=HEAD) does if you want – Sam Mason Oct 10 '22 at 20:34
@SamMason: oh, what a surprise, three squarings... – Oct 10 '22 at 20:41

std::pow very different behavior for different exponents

My Questions

2 Answers2