43

Say for very common math functions, such as sin, cos, etc... does the compiler realise they have no side effects and have the ability to move them to outer loops? For example

// Unoptimized

double YSinX(double x,int y)
{
   double total = 0.0;
   for (int i = 0; i < y; i++)
      total += sin(x);
   return total;
}

// Manually optimized

double YSinX(double x,int y)
{
   double total = 0.0, sinx = sin(x);
   for (int i = 0; i < y; i++)
      total += sinx;
   return total;
}

If they can, is there a way of declaring a function as having no side effects, and hence being safe to optimize in this way? Initial profiling of a VS2010 app suggests that the optimization is beneficial.

See also this related question, which is close but doesn't quite answer my own.

Edit: Some great answers. The one I accepted was based as much on the comments it provoked as the answer itself, notably the linked article, and the fact that hoisting may not occur in situations where errno is set (i.e. a side effect). As such, and in the context of what I'm doing, this type of manual optimization still appears to make sense.

Community
  • 1
  • 1
SmacL
  • 22,555
  • 12
  • 95
  • 149
  • 9
    It's compiler-specific, but I have seen a `pure` attribute somewhere in the gcc docs. – luser droog Apr 26 '13 at 09:50
  • 1
    In C++11 there's `constexpr`, but it has large restrictions on its use ( http://www.cprogramming.com/c++11/c++11-compile-time-processing-with-constexpr.html ) – Patashu Apr 26 '13 at 09:52
  • For C++, making a member function `const` indicates no side efects, doesn't it? – Roger Rowland Apr 26 '13 at 09:52
  • 2
    @Roger Rowland My understanding is that making a member function `const` only informs the compiler that it doesn't modify the object it is a member of. ( http://msdn.microsoft.com/en-AU/library/6ke686zh(v=vs.80).aspx ) – Patashu Apr 26 '13 at 09:53
  • 5
    @RogerRowland: It does not. Consider modifying singleton state or a mutable variable – Andrew Apr 26 '13 at 09:53
  • 7
    This is a kind of optimization that I usually do ***not*** rely on the compiler to do. It's usually easy enough to do manually, so there's little need to put yourself at the mercy of the compiler - where it becomes subject to numerous other factors. – Mysticial Apr 26 '13 at 09:56
  • @Mysticial: How about for(size_t i = 0; i < foo.size(); ++i). It's very convenient if foo.size() call will be optimized – Andrew Apr 26 '13 at 09:57
  • 2
    @Andrew Unless `foo.size()` is a trivial function that's no more complicated than a getter, I would actually pull it out and store it as a separate variable first. – Mysticial Apr 26 '13 at 10:00
  • I believe this optimisation is called, generally, "hoisting". – ta.speot.is Apr 26 '13 at 10:00
  • 5
    @Mysticial I agree. You can safely do that without polluting the enclosing scope `for(size_t i = 0, end = foo.size(); i < end; ++i).`. – juanchopanza Apr 26 '13 at 10:03
  • Did you profile a release or a debug build? – Johannes S. Apr 26 '13 at 10:03
  • is this similar to code hoisting?i mean if the function was inlined then this code could be hoisted outside the loop right? – Koushik Shetty Apr 26 '13 at 10:07
  • @JohannesS, nope release build, but the profiler I'm using is limited to function level, not line level, so I could be missing something. (AQTime standard / free edition). – SmacL Apr 26 '13 at 10:54
  • 1
    @ShaneMacLaughlin Are you asserting that `sin(x)` doesn't modify an object? How does it obtain the return value? – autistic Apr 26 '13 at 11:13
  • 3
    @Koushik: the optimization we are talking about here is actually hoisting; however as pointed out you can only hoist a function if executing it once and caching its result yields the same output as if it were executed at each iteration. – Matthieu M. Apr 26 '13 at 11:37
  • In my experience, I have seen gcc completely replace such loops with a simple multiplication. – Nathan Ernst Apr 26 '13 at 14:14
  • @Andrew That specific example _is_ one that many compilers try harder to optimize automatically. Of course doing this manually is also rather easy, as juanchopanza mentioned. – Mark Hurd Apr 26 '13 at 15:19
  • 2
    Modern compilers are able to detect [referentially transparent](http://en.wikipedia.org/wiki/Referential_transparency_%28computer_science%29) and [pure](http://en.wikipedia.org/wiki/Pure_function) functions, and optimize accordingly. Though I agree with @Mystical that it's a good habit to make these simple optimizations yourself anyways. – BlueRaja - Danny Pflughoeft Apr 26 '13 at 15:41

4 Answers4

33

GCC has two attributes, pure and const, that may be used to mark such function. If the function has no side-effect and its result depends only on its arguments, the function should be declared const, if the results may also depend on some global variable the function should be declared pure. Recent versions also have a -Wsuggest-attribute warning option that can point functions which ought to be declared const or pure.

adl
  • 15,627
  • 6
  • 51
  • 65
  • 1
    ..but does GCC make much use of them? It would be odd to have them but not use them. But how hard does GCC work to make use of them? – Raedwald Apr 26 '13 at 11:45
  • 5
    See http://lwn.net/Articles/285332/ for some discussion of the implied optimizations. – adl Apr 26 '13 at 12:20
  • I notice glibc's sin and cos definitions don't include pure or const even though those functions don't e.g. set errno – jthill Apr 26 '13 at 16:11
  • 1
    @jthill: I expect that gcc knows that common libm functions are const even without glibc decorating them. clang certainly does. – Stephen Canon Apr 26 '13 at 18:15
  • 5
    @jthill: on my system `sin(x)` is documented as setting `errno = EDOM` when `x` is an infinity. – adl Apr 26 '13 at 18:20
  • 3
    I'm pretty sure this answer gets `pure` and `const` backwards. – Ben Voigt Apr 26 '13 at 18:35
  • 8
    oh, nevermind, the answer is right about the GCC attributes, which don't follow CS terminology. The `pure` attribute is used to mark idempotent functions which are also side-effect free, and the `const` attribute is used to mark pure functions. Wonder who thought that was a good idea. – Ben Voigt Apr 26 '13 at 18:59
  • 2
    @adl erm, mine, too. You had my upvotes on the answer and your lwn link already anyway. So compilers can't hoist math calls as aggressively as OP wants, because these functions _do_ have side effects. – jthill Apr 26 '13 at 21:56
  • 1
    Note that `-fno-math-errno` exists for this exact reason. – o11c Mar 30 '16 at 00:55
13

As a matter of fact, todays common compilers will perform the kind of loop-invariant code motion optimisation you're asking about. For a demonstration of this, see the second exercise within this article entitled "Will it Optimize?", or use gcc -S -O3 and/or clang -S -O3 to assemble the example below and inspect the main entry point in assembly, as I did out of curiosity. If your VS2010 compiler doesn't perform this optimisation, not to matter; llvm/clang "integrates with MSVC 2010, 2012, 2013 and 14 CTP".

From a theoretical standing, these two quotes explain the scope or headroom that a compiler has when performing optimisations. They're from the C11 standard. IIRC C++11 has something similar.

§5.1.2.3p4:

In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

§5.1.2.3p6:

The least requirements on a conforming implementation are:

— Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.

— At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.

— The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.

This is the observable behavior of the program.

Thus, a compiler might hoist your entire program into compile-time evaluation if it can do so. Consider the following program, for example:

#include <math.h>
#include <stdio.h>

double YSinX(double x,int y)
{
    double total = 0.0;
    for (int i = 0; i < y; i++)
        total += sin(x);
    return total;
}

int main(void) {
    printf("%f\n", YSinX(M_PI, 4));
}

Your compiler might realise that this program prints 0.0\n every single time, and optimise your program into:

int main(void) { puts("0.0"); }

That is, providing your compiler can prove that neither sin nor YsinX cause any needed side-effects. Note that they may (and probably do) still cause side-effects, but they're not needed to produce the output of this program.

To demonstrate the theoretical knowledge applied in practice, I tested both llvm/clang (3.8.0 from clang --version) and gcc (6.4.0 from gcc --version) by assembling (using gcc -S -O3/clang -S -O3) the code above on my Windows 10 system, both of these compilers have effectively applied the optimisation described above; in practice you can expect main from the example above to be transformed into a machine code equivalent of int main(void) { printf("%f", 0.0); }.

You've asked a question about "the compiler". If you're referring to all C or C++ implementations, there are no guaranteed optimisations and a C implementation need not even be a compiler. You'd need to tell us which particular C or C++ implementation; as I explained above, LLVM/Clang "integrates with MSVC 2010, 2012, 2013 and 14 CTP" so it's possible that you might be using that. If your C or C++ compiler doesn't produce optimal code, get a new compiler (e.g. LLVM/Clang) or produce the optimisation yourself, preferably by modifying your compiler so you can send a patch to the developers and have the optimisation automatically propagated to other projects.

autistic
  • 1
  • 3
  • 35
  • 80
  • 1
    +1, I mentioned optimizations typically carried out, whereas I probably should have narrowed the scope to the specific compiler I'm using. I kept the scope of the question slightly broader, as I'm also interested in what other commonly used compilers do. – SmacL Apr 26 '13 at 12:09
  • That program had better not print `"0.0\n"`. Perhaps you intended to use `PI` instead of `180.0`? – Ben Voigt Apr 26 '13 at 18:52
7

What is needed to permit hoisting this subexpression outside the loop is not purity, but idempotence.

Idempotence means that a function will have the same side-effects and result if it is called once as if it is called many times with the same arguments. Therefore, the compiler can put the function call outside the loop, protected only by a conditional (would the loop iterate at least once?). The actual code after the hoisting optimization then would be:

double YSinX(double x,int y)
{
   double total = 0.0;
   int i = 0;
   if (i < y) {
       double sinx = sin(x);  // <- this goes between the loop-initialization
                              // first test of the condition expression
                              // and the loop body
       do {
          total += sinx;
          i++;
       } while (i < y);
   }
   return total;
}

The distinction between __attribute__(pure) and idempotent is important because, as adl notes in his comment, these functions do have a side-effect of setting errno.

Be careful, though, because idempotence only applies to repeated calls with no intervening instructions. The compiler will have to perform dataflow analysis to prove that the function and the intervening code don't interact (for example, the intervening code uses only locals whose addresses are never taken), before it can take advantage of idempotence. This isn't necessary when the function is known to be pure. But purity is a much stronger condition that doesn't apply to very many functions.

Community
  • 1
  • 1
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
6

I think, yes. If you get compiler disassembly output you can see that, sin is called in another label than the loop label for 'for': (compiled with g++ -O1 -O2 -O3)

Leh_func_begin1:
        pushq   %rbp
Ltmp0:
        movq    %rsp, %rbp
Ltmp1:
        pushq   %rbx
        subq    $8, %rsp
Ltmp2:
        testl   %edi, %edi
        jg      LBB1_2
        pxor    %xmm1, %xmm1
        jmp     LBB1_4
LBB1_2:
        movl    %edi, %ebx
        callq   _sin ;sin calculated
        pxor    %xmm1, %xmm1
        .align  4, 0x90
LBB1_3:
        addsd   %xmm0, %xmm1
        decl    %ebx
        jne     LBB1_3 ;loops here till i reaches y
LBB1_4:
        movapd  %xmm1, %xmm0

I hope i'm correct.

aliep
  • 1,702
  • 2
  • 21
  • 33
  • 9
    `sin()` and most of the built-in functions are different because the compiler already knows they are pure functions. However, current compilers will usually choke on larger programmer-made functions. – Mysticial Apr 26 '13 at 10:09