Relying on compiler optimizations with C++

Question

Question: is it OK to rely on compiler optimizations while coding?

Let's say I need to calculate calculateF and calcuateG which both depend on another value returned by getValue. Sometimes I need both values, some other times I only need one of those values.

// some function
double getValue(double value)
{
    double val(0.0);
    // do some math with value
    return val;
}

// calculateF depends on getValue
double calculateF(double value)
{
    double f(0.0);
    auto val = getValue(value);
    // calculate f which depends on val (and value)
    return f;
}

// calculateG depends on getValue
double calculateG(double value)
{
    double g(0.0);
    auto val = getValue(value);
    // calculate g which depends on val (and value)
    return g;
}

Now, I could write this more elegantly:

std::pair<double,double> calculateFG(double value)
{
    auto val = getValue(value);
    double f(0.0), g(0.0);
    // calculate f and g which depend on val (and value)
    return {f,g};
}

If I want both values:

double value(5.3);
auto [f,g] = calculateFG(value); // since C++17
// do things with f and g

If I want only 1 value, say f, I just don't use g and it will be optimized out. So, the performance of calculateFG is exactly the same as calculateF if I don't use g. Furthermore, if I need both f and g, I only need to call getValue once instead of twice.

The code is cleaner (only 1 function calculateFG instead of calculateF and calculateG), and faster if both f and g are required. But is relying on the compiler optimization a wise choice?

Do you have a situation like this where you've actually observed the compiler actually does optimize something out? This sounds like a difficult optimization to apply in the first place. — aschepler, Feb 11 '19 at 23:30
@aschepler Yes. I run some benchmarks on the above example (instead of returning doubles, I return vectors and matrices). The results are exactly as described in the question with clang 7.0 and gcc 8.2. — mfnx, Feb 11 '19 at 23:32
why not use code like this ? `void calculateFG(double value, double* pf, double* pg) { auto val = getValue(value); if (pf) *pf = x(val); if (pg) *pg = y(val); }` — RbMm, Feb 12 '19 at 01:52
In the example code you've provided, where all the functions above are essentially no-ops and probably get some free inlining as well, everything is going to get heavily optimized. With real code and as you put it, "returning vectors and matrices", is going to do something entirely different. You should consider updating your question to show the real calculations, return values, and variable dependencies. — selbie, Feb 12 '19 at 05:24
Your premise is flawed. Cramming two independent computations into one function isn't "more elegant". — Passer By, Feb 12 '19 at 05:29
You could also call getValue first, and then pass it as argument to both calculateF and calculateG. — hyde, Feb 12 '19 at 06:41
@RbMm Because than you have to declare pf and pg before calling the function. Imagine f and g are structs which are built with a given size known at compile time. — mfnx, Feb 12 '19 at 07:32
@selbie In the real example (too long to post here), all the variables are local variables on the stack. I have of course tested my real situation, and the results are as stated in the question. Then I decided to write this minimal example with one goal: to know whether or not it is good practice to rely on compiler optimizations. — mfnx, Feb 12 '19 at 07:35
@PasserBy With elegant I referred to the single call to getValue. — mfnx, Feb 12 '19 at 07:36
@xskxzr Sure I could (calculateF, calculateG and calculateFG). I just wonder about if it is good practice to write the code as I did. — mfnx, Feb 12 '19 at 07:37
@hyde Well yes, but that's the beginning of the end, isn't it? And getValue must then be available in the same way as calculateF and calculateG to the caller. It puts restrictions on getValue. — mfnx, Feb 12 '19 at 07:39
@MFnx But it also gives freedom to use any `getValue`, which is useful unless there is absolutely just one `getValue` which makes sense for the context. — hyde, Feb 12 '19 at 07:56
@MFnx - and so what ? if *f* and *g* structures - espessialy need exactly this form - pass *pointers* to it for `calculateFG` — RbMm, Feb 12 '19 at 09:23
@RbMm The caller does not necessarily know those sizes at compile time. — mfnx, Feb 12 '19 at 09:51
@MFnx - i caller does not know data format - how he at all can ask result of unknown data ?! only if say `void calculateFG(double value, void** ppf, void** ppg)` and `calculateFG` return 1 or 2 allocated pointers — RbMm, Feb 12 '19 at 10:07
@MFnx - if caller does not know size of requested data - he can not allocate storage. so this must do callee. so caller pass to callee pointer to pointer. callee allocate data and return pointer to it to caller. also possible that callee return requested size to caller for caller can allocate storage itself. signature in this case look like `int fn(void* pv, uint cb, uint* rcb)` — RbMm, Feb 12 '19 at 10:17
@RbMm With auto... Imagine that the caller has a pointer to the base class of several implementation classes which all contain the calculateFG method returning a matrix (on the stack, size known at compile time). The base class has no idea what's the size of the matrices is gonna be. The caller calls auto matrix = ... But this is getting a bit out of scope I think. — mfnx, Feb 12 '19 at 10:17
@MFnx - again in this case need `calculateFG(double value, Base** ppf, Base** ppg)`. in `calculateFG` code do `Derived* pf = new Derived(value)` and `*ppf = pf` this is how this correct and most efficient must done (mean `class Derived : Base`) — RbMm, Feb 12 '19 at 10:20
@MFnx - and *auto* senseless here - it absolute not help. caller anyway can not allocate storage for object if he can not know size of object and it type. *auto* not help. he can only declare and allocate *pointer* to object. so callee allocate object and return pointer to it for caller. like `Base* pb = calculateFG()` where `calculateFG` actually return `new Derived` but if we want return multiple pointers - need use my form of signature. you try allocate additional structure where store this multiple pointers and return pointer to this structure. this is less efficient — RbMm, Feb 12 '19 at 10:25
@RbMm All those objects are local variables on the stack. auto just works fine. I think from your last example you didn't understand what I meant. Base* p1 = new Derived1(); Base* p2 = new Derived2(); Base got a virtual method VectorX getX() const; Derived1 and Derived2 have a getX which override the Base class' one, but return Vector1 and Vector2 respectively (both castable to VectorX). The caller does auto m = p1->getX() or auto m = p2->getX()... You see? — mfnx, Feb 12 '19 at 10:33
@MFnx - you can not allocate object as local variable in stack if you dont know type of object in compile time. *auto* not help here - senseless. but you can allocate **pointer** to any, unknown object. look like you not understand what you doing and how code internal work — RbMm, Feb 12 '19 at 10:39
@RbMm ok, I hear you. I know auto is no remedy. The compiler knows the sizes. I don't know how to make this clear, or if I really miss something here? Maybe we could take this conversation somewhere else? Here a temporary link with some code to let you see what I mean (it's the commented code). http://quick-bench.com/9oUnYSbNGX6sw_BugmPL2FQnr8Q — mfnx, Feb 12 '19 at 12:13
@RbMm In that code I cast badly (it should be done dynamically). I do understand you now. Thanks! — mfnx, Feb 12 '19 at 13:40

Hanjoung Lee · Answer 1 · 2019-02-12T06:31:21.940

2

It is hard to say if it is wise or not. It depends on compiler optimization - function inlining.

If calculateFG is inlined, the complier can optimize out the unused one. Once inlined, g is unused so all the code for generating g is dead code^[1]. (It may not be able, for example, if the calculation code has some side effects)
If not, I don't think the optimization can be applied(Always calc f and g).

Now you may wonder if it is possible to always inline specific functions.

Please note that giving inline keyword does not force the compiler to inline that function. It is just a hint. With or without the keyword, it is the compiler's call. It seems like there is non-standard way though - How do I force gcc to inline a function?

^{[1]_{Relavent compiler options : -fdce -fdse -ftree-dce -ftree-dse}}

edited Feb 12 '19 at 06:31

answered Feb 12 '19 at 06:24

Hanjoung Lee

2,123
1
12
20

Thanks. So, if you knew the compiler could optimized it, would you use the calculateFG function? Would you rely on that optimization? I would... and I think we do it in many cases (take RVO for instance). – mfnx Feb 12 '19 at 07:50
1

I wouldn't. Cuz this is not good for readability. I would rather declare all the functions `calculateF`, `calculateG` and `calculateFG` and call them appropriately. And if all inilned, it does not even increase compiled code size. But for RVO I would. – Hanjoung Lee Feb 12 '19 at 09:27

score 1 · Answer 2 · answered Feb 12 '19 at 11:39

Modern C++ compilers are pretty good at optimization choices, given the chance.

That is to say, if you declare a function inline, that does not mean the optimizer will actually ilnine it 100% of the time. The effect is more subtle: inline means you avoid the One Definition Rule, so the function definition can go into header files. That makes it a lot easier for the optimizer.

Now with your examples of double [f,g], optimizers are very good at tracking the use of simple scalar values, and will be able to eliminate write-only operations. Inlining allows the optimizer to eliminate unnecessary writes in called functions too. For you, that means the optimizer can eliminate writes to f in calculateFG when the calling code does not use f later on.

score 1 · Answer 3 · answered Feb 12 '19 at 13:22

Perhaps it is best to turn the logic inside-out. Instead of computing a value (getValue()), passing it to both calculateF() and calculateG(), and passing the results to another place, you can change the code to pass the functions instead of computed values.

This way, if the client code does not need calculateF's value, it won't call it. The same with calculateG. If getValue is also expensive, you can call it once and bind or capture the value.

These are concepts used extensively in functional programming paradigm.

You could rewrite your calculateFG() function more or less like this:

auto getFG(double value)
{
    auto val = getValue(value);
    return {
        [val]{ return calculateF(val); },
        [val]{ return calculateG(val); }};
}

score 0 · Answer 4 · answered Feb 12 '19 at 06:05

It sounds like your goal is to only perform the (potentially expensive) calculations of getValue(), f, and g as few times as possible given the caller's needs -- i.e. you don't want to perform any computations that the caller isn't going to use the results of.

In that case, it might be simplest to just implement a little class that does the necessary on-demand computations and caching, something like this:

#include <stdio.h>
#include <math.h>

class MyCalc
{
public:
   MyCalc(double inputValue)
      : _inputValue(inputValue), _vCalculated(false), _fCalculated(false), _gCalculated(false)
   {
      /* empty */
   }

   double getF() const
   {
      if (_fCalculated == false)
      {
         _f = calculateF();
         _fCalculated = true;
      }
      return _f;
   }

   double getG() const
   {
      if (_gCalculated == false)
      {
         _g = calculateG();
         _gCalculated = true;
      }
      return _g;
   }

private:
   const double _inputValue;

   double getV() const
   {
      if (_vCalculated == false)
      {
         _v = calculateV();
         _vCalculated = true;
      }
      return _v;
   }

   mutable bool _vCalculated;
   mutable double _v;

   mutable bool _fCalculated;
   mutable double _f;

   mutable bool _gCalculated;
   mutable double _g;

   // Expensive math routines below; we only want to call these (at most) one time
   double calculateV() const {printf("calculateV called!\n"); return _inputValue*sin(2.14159);}
   double calculateF() const {printf("calculateF called!\n"); return getV()*cos(2.14159);}
   double calculateG() const {printf("calculateG called!\n"); return getV()*tan(2.14159);}
};

// unit test/demo
int main()
{
   {
      printf("\nTest 1:  Calling only getF()\n");
      MyCalc c(1.5555);
      printf("f=%f\n", c.getF());
   }

   {
      printf("\nTest 2:  Calling only getG()\n");
      MyCalc c(1.5555);
      printf("g=%f\n", c.getG());
   }

   {
      printf("\nTest 3:  Calling both getF and getG()\n");
      MyCalc c(1.5555);
      printf("f=%f g=%f\n", c.getF(), c.getG());
   }
   return 0;
}

I understand that, thanks for your answer. But then if you want to compute it with threads, you have to protect your attributes. Of course you don't have to know that I'm using threads. But my question is just about the decision of relying or not on compiler optimizations while coding. — mfnx, Feb 12 '19 at 07:49

score 0 · Answer 5 · answered Feb 12 '19 at 11:29

0

I think that it's best to write your code in a way that expresses what you are trying to accomplish.

If your goal is to make sure that certain calculations are only done once, use something like Jeremy's answer.

answered Feb 12 '19 at 11:29

Michael

1,263
1
12
28

balki · Answer 6 · 2019-02-12T18:37:47.080

A good function should do only one thing. I would design like below.

class Calc {
    public:
    Calc(double value) : value{value}, val{getValue(value)} {
    }
    double calculateF() const;
    double calculateG() const;
    //If it is really a common usecase to call both together
    std::pair<double, double> calculateFG() const {
        return {calculateF(), calculateG()};
    }
    static double getValue(double value);
    private:
    double value;
    double val;
};

To know whether compiler will optimize will depend on the rest of the code. For example, if there was a debug message like log_debug(...), that could affect dead code removal. Compiler can only get rid of the dead code if it can prove that the code has no side effects in compile time (Even if you force inline).

Other option is, you can mark the getValue function with special compiler specific attributes like pure or const. This can force the compiler to optimize the second call of getValue. https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-g_t_0040code_007bpure_007d-function-attribute-3348

Relying on compiler optimizations with C++

6 Answers6