7

In C code it is common to write

a = b*b;

instead of

a = pow(b, 2.0);

for double variables. I get that since pow is a generic function capable of handling non-integer exponents, one should naïvely think that the first version is faster. I wonder however whether the compiler (gcc) transforms calls to pow with integer exponents to direct multiplication as part of any of the optional optimizations.

Assuming that this optimization does not take place, what is the largest integer exponent for which it is faster to write out the multiplication manually, as in b*b* ... *b?

I know that I could make performance tests on a given machine to figure out whether I should even care, but I would like to gain some deeper understanding on what is "the right thing" to do.

jmd_dk
  • 12,125
  • 9
  • 63
  • 94
  • 1
    Architecture-dependent. – Martin James Sep 04 '17 at 15:22
  • 2
    `pow()` (which I assume is the function you mean) is not merely a function *capable* of handling non-integer exponents; it is a function accepting arguments of type `double` and returning `double`. This is a somewhat subtle point, but the types of the parameters and return value are as significant as the values that they may take. – John Bollinger Sep 04 '17 at 15:23
  • 2
    The compiler does not make this transformation, because `power` deals with floating-point values. It is almost always faster to write out `b*b ... *b` for computing integer powers. – Cody Gray - on strike Sep 04 '17 at 15:25
  • _I would like to gain some deeper understanding on what is "the right thing" to do._ More info about your specific context might be necessary to answer what is "the right thing" to do; without an overriding reason to write a line like `a = b*b* ... *b`, the right thing might be to not optimize early and instead write readable code using `power()`. – Peter W Sep 04 '17 at 15:27
  • 1
    This is very possibly a duplicate of [this question](https://stackoverflow.com/questions/2940367/what-is-more-efficient-using-pow-to-square-or-just-multiply-it-with-itself), except that you are explicitly asking about integers, whereas most of the answers to that question assume floating-point. The answer is the same, though: it is faster to write out the multiplication manually, because the library function has to be sufficiently general to handle *all* possible cases. This is true for both integers and floating-point values, but *especially* true for integers, as FP-conversion is quite slow. – Cody Gray - on strike Sep 04 '17 at 15:29
  • 5
    Interestingly, `gcc` and `clang` on x86-64 seem to convert `pow(b, 2.0)` to `b * b`, but `b` is still treated as a double value ([godbolt reference](https://godbolt.org/g/aPsgoa)). Look at the `mulsd %xmm0, %xmm0` instruction. The same optimization does not seem to happen when replacing 2.0 with 3.0. – Ben Steffan Sep 04 '17 at 15:30
  • @PeterW I have a Python -> Cython -> C framework, where I am in charge of any code transformation. The readability is not important, but performance is. I am considering writing the integer power -> multiplication transformation, but if gcc already does this it would be a waste of work. – jmd_dk Sep 04 '17 at 15:32
  • https://stackoverflow.com/questions/6430448/why-doesnt-gcc-optimize-aaaaaa-to-aaaaaa?rq=1 – Retr0id Sep 04 '17 at 15:41
  • 1
    The question is a bit off the mark. For powers greater than 3 you generally want to use the repeated squaring algorithm rather than simply multiplying `n-1` times: https://en.wikipedia.org/wiki/Exponentiation_by_squaring – Gene Sep 04 '17 at 16:21
  • @Ben Steffan what is surprising you? When you do memcpy you will have the same with the small chunks of memory. – 0___________ Sep 04 '17 at 17:18

2 Answers2

2

What you want is -ffinite-math-only -ffast-math and possibly #include <tgmath.h> This is the same as -Ofast without mandating the -O3 optimizations.

Not only does it help these kinds of optimizations when -ffinite-math-only and -ffast-math is enabled, the type generic math also helps compensate for when you forget to append the proper suffix to a (non-double) math function.

For example:

#include <tgmath.h>
float pow4(float f){return pow(f,4.0f);}
//compiles to
pow4:
    vmulss  xmm0, xmm0, xmm0
    vmulss  xmm0, xmm0, xmm0
    ret

For clang this works for powers up to 32, while gcc does this for powers up to at least 2,147,483,647 (that's as far as I checked) unless -Os is enabled (because a jmp to the pow function is technically smaller) - with -Os, it will only do a power of 2.

WARNING -ffast-math is just a convenience alias to several other optimizations, many of which break all kinds of standards. If you'd rather use only the minimal flags to get this desired behavior, then you can use -fno-math-errno -funsafe-math-optimizations -ffinite-math-only

technosaurus
  • 7,676
  • 1
  • 30
  • 52
0

In terms of the right thing to - consider your maintainer not just performance. I have a hunch you are looking for a general rule. If you are doing a simple and consistent square or cube of a number, I would not use pow for these. pow will most likely be making some form of a subroutine call versus performing register operations (which is why Martin pointed out architecture dependendency).