2

I'm developing some high-performance statistical functions and trying to make them perform different operations based on some function arguments. The present problem is to develop a general code that flexibly performs differencing, partial / generalized differencing, growth rates and log-differencing based on function arguments. To give a basic example:

// General code to first-difference a vector:
std::vector<double> diff(std::vector<double> x, int ret = 1, double rho = 1, ...) { // ... = more arguments               int l = x.size();
                         std::vector<double> res(l);
                         for (int i = 1; i < l; ++i) res[i] = FUN(x[i], x[i - 1]); 
                         // rest of code ...
}

Now FUN(y, x) is the thing I want to vary efficiently based on arguments to ret and rho. For simple differences it is y-x, for generalized differences it is y-rho*x, for log differences it is log(y/x), for growth rates (y-x)*(100/x), and more options could be added. The code is applied to large datasets and needs to be fast, so optimally I would use something like a conditionally created macro for FUN, i.e. something like:

std::vector<double> diff(std::vector<double> x, int ret = 1, double rho = 1, ...) {
                         int l = x.size();
                         std::vector<double> res(l);
                         #define FUN(y, x) (ret==1 && rho==1) ? ((y)-(x)) : \
                                           (ret==1) ? ((y)-rho*(x)) :      \
                                           (ret==2) ? (log((y)/(x))) :     \
                                           (ret==3) ? ((y)-(x))*(100/(x))                           
                         for (int i = 1; i < l; ++i) res[i] = FUN(x[i], x[i - 1]); 
}

This works, but judging from the reduced code speed it seems to me that I am not conditionally creating a macro but just one macro and each time FUN is called, all the conditions are evaluated to perform the right operation. I have looked a bit into these preprocessor commands (#if, #elif, #else, #endif, #define and #undef), but it seems to me you cannot use them to conditionally create a macro based on function arguments. A second approach could be using inline functions, i.e.:

inline double do_diff(double y, double x, double r) {
  return y-x; 
}
inline double do_gdiff(double y, double x, double r) {
  return y-r*x;
}
inline double do_logdiff(double y, double x, double r) {
  return log(y/x); 
}
inline double do_growth(double y, double x, double r) {
  return (y-x)*(100/x); 
}

std::vector<double> diff(std::vector<double> x, int ret = 1, double rho = 1, ...) {
                         int l = x.size();
                         std::vector<double> res(l);
                         auto FUN = (ret==1 && rho==1) ? do_diff : 
                                    (ret==1) ? do_gdiff :       
                                    (ret==2) ? do_logdiff :      
                                    (ret==3) ? do_growth;
                         for (int i = 1; i < l; ++i) res[i] = FUN(x[i], x[i - 1], rho); 
}

The problem here is just that it reduces the code speed around 1.5 times. Given that these are really simple operations and this code is supposed to be as fast as possible, I would rather avoid this. So my question is: Is there any way to vary the operation performed by FUN that comes at a negligible performance cost?

Note: Code duplication here is undesirable as the actual code I'm working on is far more complex, i.e. it can perform iterated differences on unordered panel-data etc, about 700 lines in which FUN enters in multiple places.

Sebastian
  • 1,067
  • 7
  • 12
  • 2
    Not what you're asking, but passing `x` as a `const &` should save you the copy on every `diff` call. Are you timing on a debug build or on the final -O3 build? – JohnFilleau Apr 18 '20 at 18:47
  • 1
    You are talking about a micro-optimization. At the processor level, something needs to set the status of the compare, usually a compare instruction. Next, the processor performs a *conditional* branch, which may cause the instruction pipeline to be reloaded. Some processors can conditionally execute instructions, thus avoiding the branch decision circuitry. You may be able to reduce comparisons by using algebra or Boolean Arithmetic. It's the branching that is slowing down your program's execution. – Thomas Matthews Apr 18 '20 at 19:01
  • 1
    You'll be able to get greater performance by (in order): 1) Requirement removal & clarification; 2) Optimizing the design; 3) Coding to help the compiler optimize. IMHO, making the program behave correctly and robustly outweighs the need for speed (except in timing critical applications). – Thomas Matthews Apr 18 '20 at 19:03
  • 1
    If your program is more data driven, you may want consider using more `switch` statements or tables of function pointers. These are known as *jump tables*. The code looks up an address in the table (preferably by array index), the branches to that address. This operation can be simplified to a couple of instructions. – Thomas Matthews Apr 18 '20 at 19:06
  • 1
    Thanks @JohnFilleau, yes I actually do use constant references in the final code, I just wanted to keep the example simple. – Sebastian Apr 18 '20 at 19:35
  • 1
    Thanks also @Thomas, I'm not sure I understand all that you are suggesting. There is limited space for working on the compilation etc. as this is code for R and and is compiled through the Rcpp API, and regarding optimization would need some more targeted suggestions as I am indeed a beginner in C++. In general I want to keep the code as simple as possible while still warranting decent performance. – Sebastian Apr 18 '20 at 19:35
  • 2
    I think this: https://stackoverflow.com/q/14428687/4408538 is what you are looking for. Also check out the Rcpp gallery: https://gallery.rcpp.org/articles/passing-cpp-function-pointers/ – Joseph Wood Apr 18 '20 at 23:38
  • Thanks @JosephWood !! This is certainly interesting! I presume though in terms of performance is would be the same as my second example which also utilizes the function pointer? I havent checked it, but it seems to me the speed from Jarod's solution comes from the entire function being templated and compiled for different settings... – Sebastian Apr 19 '20 at 15:01

2 Answers2

4

Template seem more appropriate:

template <typename F>
std::vector<double> diff(std::vector<double> x, F f, double rho = 1, ...) {
     int l = x.size();
     std::vector<double> res(l);
     for (int i = 1; i < l; ++i) res[i] = f(x[i], x[i - 1], rho); 
     // ...
}

std::vector<double> diff(const std::vector<double>& x, ret = 1, double rho = 1, ...) {
    switch (ret)
    {
        case 1: return (rho == 1) ?
                       diff(x, []{double y, double x, double r) { return y-x; }, rho) :
                       diff(x, []{double y, double x, double r) { return y-r*x; }, rho);
        case 2: return diff(x, []{double y, double x, double r) { return log(y/x); }, rho);
        case 3: return diff(x, []{double y, double x, double r) { return (y-x)*(100/x); }, rho);
    }
    throw std::runtime_error("Invalid argument");
}

It seems even that rho could only be captured by (one) lambda, allowing to get rid of one parameter.

Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • Thank a lot @Jarod42, I'll try this as well! – Sebastian Apr 18 '20 at 19:53
  • 1
    Thanks!! I tried it and it is by far the best solution (no real performance loss compared to the simple differencing code at all)! And yes it is also possible to reference the rho parameter, as also shown [here](https://stackoverflow.com/questions/30217956/error-variable-cannot-be-implicitly-captured-because-no-default-capture-mode-h). – Sebastian Apr 18 '20 at 22:01
3

Firstly, a macro is just a form of textual substitution performed on the source code. Anything you write as a macro, you can also write as regular code. Check any C++ FAQ, it will tell you that macros are evil and that they are to be avoided generally. Your case isn't special, it doesn't benefit from use of macros.

Then, there are some points in your code that stick out:

  • You're passing vectors by value to functions.
  • You're returning vectors by value from functions.
  • Using int for the size when modern auto is available.

This suggests you have some basics to learn still and/or that you're writing in the style of a different language. The latter then leads to non-idiomatic code which in turn might make it hard for the compiler to optimize.

Lastly, your second approach creates an alias to one of four functions, which probably boils down to a function pointer. This won't be inlined then. The redundant parameter to some of these functions is another obstacle for the optimizer.

Suggestion: Create four separate functions that take a vector and rewrite the current diff() to just dispatch to these according to your existing logic. Also, you do have measurements, but try to run this in a profiler. It will tell you more accurately where the time is spent, which may well be surprising.

Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
  • 1
    Returning vector by value is fine now: between (N)RVO, and move at worst. – Jarod42 Apr 18 '20 at 19:10
  • Thanks a lot @Ulrich, you are write that I am a beginner and I am programming for R, I do know what a macro is though and I use constant references in the final code. Your suggestion to create different functions is actually good, I could try to create some diff function which includes the loop and take it from there... – Sebastian Apr 18 '20 at 19:44
  • Just out of curiosity: I presume the function pointer approach would be: `double do_diff(double y, double x, double r) {return y-x};` and the pointer `double (*function_pointer)(double y, double x, double r);` and then finally assigning the functions again to the pointer the way I do above. Is that more desirable or faster? – Sebastian Apr 18 '20 at 19:47
  • 1
    In your third piece of code, `FUN` is a function pointer. This is a clean approach, but maybe not the fastest. If the compiler is not smart enough to figure out how this works, it will not be able to inline the function calls, which will cause a lot of overhead. – Ulrich Eckhardt Apr 18 '20 at 20:41