8

I have a class that uses a reference to a function:

double u( const double& x, const double& y )
{
  return x * y;
}

class equation
{
  equation( double (&in_u)(const double&, const double&) );
//...
protected:
  double (&u)(const double&, const double&);
}

This function would be called something like 108 times during a typical run.

The class goes into a library and the function u is defined by the user of the library. So I cannot have the function definition inside the class.

I have read this:

(std::function) ... has the disadvantage of introducing some (very small) overhead when being called (so in a very performance-critical situation it might be a problem but in most it should not)

Are there any more efficient ways of passing the function u to the class equation? And would this count as "a very performance-critical situation"?

EDIT

There seems to be a bit of confusion. Just to make it clear, the function u is known at the executables' compile time, but not at the library's. Getting the function at run-time is a feature I will consider in later versions of the library, but not now.

Eliad
  • 894
  • 6
  • 20
  • 2
    Passing the two doubles by value is probably faster than dereferencing them. – Beta Carotin Apr 24 '15 at 19:44
  • Is `u` known at compile time? – danielschemmel Apr 24 '15 at 19:48
  • @gha.st, Yes, it is right now. Though the goal is to have the library available and the final applications using dynamically links to it. – Eliad Apr 24 '15 at 20:06
  • I smell a case where the function overload could be computed at compile time... – Mgetz Apr 24 '15 at 20:49
  • @Hurkyl, I know. But `u` would be defined for example above `main()`, not in the header files. – Eliad Apr 24 '15 at 20:57
  • 1
    @Furihr: The header files are part of the library too. To use an analogy, you seem to be focused on writing something with an interface like the C routine `qsort`, whereas writing something with an interface more like the C++ routine `std::sort` can perform much better. –  Apr 24 '15 at 20:58
  • Can the library expose its guts in a header file? Ie, can the body of `equation` and all consumers of the type be in header files? (other than parts where efficiency matters less, and type erasure is practical) Alternatively, what code calls `equation`? In short, it might be very important *how* equation is used (in particular, the e8 calls and their context). I'm thinking anything from stateless functor to type erasing the call operation (where it is easy to inline). – Yakk - Adam Nevraumont Apr 24 '15 at 21:06
  • @Yakk, I think not (If I understood you correctly). `equation` is defined in `main()` and the main call is `equation.solve(/*...*/)`. But there is a complicated interaction between several classes during this call. – Eliad Apr 24 '15 at 21:12
  • Can we expose the implementation of `solve` in a header file? If not, is there a chunk of code "around" where you call `u` very often that can be exposed, such that we the exposed piece would be called orders of magnitude less often than `u`? Yes to the first gives you @NetVipeC's solution. Yes to the second gives you an efficient type-erasure based implementation. – Yakk - Adam Nevraumont Apr 24 '15 at 21:13
  • @Yakk, I am not sure what you mean by "expose", but the project is free (as in freedom) and there is no attempt to hide the implementation. Does that answer your question? BTW, `solve` calls class `A` which calls class `B` which in turn calls `equation.u`. – Eliad Apr 24 '15 at 21:23
  • It depends on what `u` is doing. The overhead of a std::function object call compared to a function (pointer) call cannot be more than one or two indirections. If your average function does anything significant that is negligible. – Peter - Reinstate Monica Apr 24 '15 at 22:14
  • @Hurkyl I was under the impression that std::sort is actually pretty much the old `qsort` idea in new clothes (provide a comparing function which is unknown at library compile time as a callback). – Peter - Reinstate Monica Apr 24 '15 at 22:18
  • 1
    @peter except for the performance increase, sure, it is conceptually similar. https://solarianprogrammer.com/2012/10/24/cpp-11-sort-benchmark/ – Yakk - Adam Nevraumont Apr 24 '15 at 23:51

5 Answers5

4

Given that the function isn't known at compile time, you won't get any faster than a function pointer/reference.

The advantage of std::function is that it would allow you to take, say, a functor, member function pointer or lambda expressions. But there is some overhead.

As one comment mentioned, I would replace the const double & args with double. Size is the same on most platforms these days and it removes a dereference.

Here is an example using std::function:

#include <iostream>
#include <functional>
#include <math.h>

double multiply(double x, double y) { return x * y; }
double add(double x, double y) { return x + y; }

class equation
{
public:
    using ComputeFunction_t = std::function<double(double, double)>;

    template <typename FunctionPtr>
    equation(FunctionPtr pfn)
        : computeFunction_m(pfn)
    { }

    void compute(double d1, double d2)
    {
        printf("(%f, %f) => %f\n", d1, d2, computeFunction_m(d1, d2));
    }

protected:
    ComputeFunction_t computeFunction_m;
};

int main() {
    equation prod(multiply);
    prod.compute(10, 20); // print 200

    equation sum(add);
    sum.compute(10, 20);  // print 30

    equation hypotenuse([](double x, double y){ return sqrt(x*x + y*y); });
    hypotenuse.compute(3, 4); // print 5

    struct FooFunctor
    {
        FooFunctor(double d = 1.0) : scale_m(d) {}

        double operator()(double x, double y) { return scale_m * (x + y); }
      private:
        double scale_m;
    };

    equation fooadder(FooFunctor{});
    fooadder.compute(10, 20); // print 30

    equation fooadder10(FooFunctor{10.0});
    fooadder10.compute(10, 20);

    struct BarFunctor
    {
        BarFunctor(double d = 1.0) : scale_m(d) {}

        double scaledAdd(double x, double y) { return scale_m * (x + y); }
      private:
        double scale_m;
    };

    BarFunctor bar(100.0);
    std::function<double(double,double)> barf = std::bind(&BarFunctor::scaledAdd, &bar, std::placeholders::_1, std::placeholders::_2);
    equation barfadder(barf);
    barfadder.compute(10, 20); // print 3000

    return 0;
}

But, again, this gain in flexibility does have a small runtime cost. Whether its worth the cost depends on the application. I'd probably lean toward generality and a flexible interface first and then profile later to see if it is a real issue for the sorts of functions that will be used in practice.

If you can make your solver into a header-only library, then when the user provides inline-able functions in his code, you may be able to get better performance. For instance:

template <typename ComputeFunction>
class Equation
{
  public:

    Equation(ComputeFunction fn)
      : computeFunction_m(fn)
    { }

    void compute(double d1, double d2)
    {
        printf("(%f, %f) => %f\n", d1, d2, computeFunction_m(d1, d2));
    }

  protected:
    ComputeFunction computeFunction_m;
};

template <typename ComputeFunction>
auto make_equation(ComputeFunction &&fn)
{
    return Equation<ComputeFunction>(fn);
}

Your instantiation of the Equation class now can completely inline the execution of the function. Calling is very similar, given the make_equation function (the above implementation assumes C++14, but the C++11 version isn't much different):

auto fooadder2 = make_equation(FooFunctor{});
fooadder2.compute(10, 20);

auto hypot2 = make_equation([](double x, double y){ return sqrt(x*x + y*y); });
hypot2.compute(3, 4);

With full optimization you'll likely only find the call to printf with the results of the calculation in the compiled code.

sfjac
  • 7,119
  • 5
  • 45
  • 69
4

A function pointer (or reference, which is almost identical at the implementation level) will work just fine.

Modern CPUs are very good at branch prediction, after the first couple calls the CPU will recognize that this "indirect" call always goes to the same place, and use speculative execution to keep the pipeline full.

However, there still will be no optimization across the function boundary. No inlining, no auto-vectorization.

If this function is being called 108 times, it is likely that a large number of those are in a very tight loop with varying parameters. In that case, I suggest changing the function prototype to accept an array of parameter values and output an array of results. Then have a loop inside the function, where the compiler can perform optimizations such as unrolling and auto-vectorization.

(This is a specific case of the general principle to deal with interop cost by reducing the number of calls across the boundary)

If that isn't possible, then do pass the parameters by value. As others have said, this is most efficient than const reference for floating-point variables. Probably a lot more efficient, since most calling conventions will use floating-point registers (typically SSE registers, on modern Intel architectures, before that they used the x87 stack) where they are ready to perform computations immediately. Spilling values to/from RAM in order to pass by reference is quite costly, when the function gets inlined then pass-by-reference gets optimized away, but that won't be happening here. This is still not as good as passing an entire array though.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
2

Using template arguments:

struct u {
    double operator()(const double& x, const double& y) { return x * y; }
};

template <typename Function>
class equation {
    equation();
    //...
    double using_the_function(double x, double y) {
        //...
        auto res = f(x, y);
        //...
        return res;
    }

private:
    Function f;
};

If you don't need to modify the parameters to the function, in the function, it's better to pass by value (in the case of build-in types, this most probably are values that would be load in CPU registers or are already load).

struct u {
    double operator()(double x, double y) { return x * y; }
};

This most probably would inline u in using_the_function method. In you case the compiler could not do it, because the function pointer could point to any function.

The possible problem of this approach if code bloat if you need to support a lot of different functions and/or class is big.

NetVipeC
  • 4,402
  • 1
  • 17
  • 19
  • The OP stated that the functions only known at runtime. – sfjac Apr 24 '15 at 19:52
  • 1
    @sfjac where does the OP say the function is only known at run time? – NathanOliver Apr 24 '15 at 19:56
  • "The class goes into a library and the function u is defined by the user of the library. So I cannot have the function definition inside the class." – sfjac Apr 24 '15 at 19:57
  • I don't see the problem of the OP statement with template arguments. In the place of the class instantiation (in the client) the `function u` would be known. – NetVipeC Apr 24 '15 at 20:02
  • 4
    @sfjac: That quote does not in any way imply that the functions are only known at runtime. – Benjamin Lindley Apr 24 '15 at 20:04
  • 1
    @BenjaminLindley But the quote from a comment "the goal is to have the library available and the final applications using dynamically links to it" does. So templates do not seem to be a viable solution. – cmaster - reinstate monica Apr 24 '15 at 20:40
  • This is exactly what I was going to suggest. I [recently made this approach myself for a similar problem](http://stackoverflow.com/q/28820978/560648). However, reading these comments, I have to admit it doesn't seem like the OP's specific scenario allows for it. – Lightness Races in Orbit Apr 24 '15 at 20:46
1

With 10^8 calls, and no ability to provide the function definition at compile-time to the callers, I would suggest a design change if possible to something like this:

void u(int num, const double* x, const double* y, double* out_results);

The idea to allow equation and other similar functions to get multiple results in a single call.

Now this won't automatically give you some speed boost. You can easily exchange one overhead for another if you're say, constructing a variable-sized queue of work to do for u and your algorithm is very serial in nature. It depends a lot on the nature of the algorithms that use u.

However, if the algorithms can, say, quickly construct an array of N x and y values to compute on the hardware stack, even say 8 at a time, that can help.

It'll also make u suitable for multithreading with parallel fors and such, as you generally want the work to be sufficiently chunky for u to trivialize the thread overhead involved with scheduling tasks for u to do in each thread. So designing u to compute a variable number of results at one time can really help you design a more stable interface that can avoid being broken in response to optimizations.

  • If you do this, consider also providing a template to wrap the existing single-invoke function signature to run against `num` values. That will ease migration for existing users and make you less unpopular. – Toby Speight May 13 '15 at 10:31
  • Somewhat agree -- though if it's an API concerned with large-scale number crunching, I kinda prefer that lower-level C-ish API with the blocky and loopy logic, like OGL with its general block-striding mindset and somewhat lack of safety, if nothing more than the fact that it builds faster and has a minimalist mentality and the slightly added bonus that the client can choose how to tackle the iteration (with his own unrolling, SIMD, etc). I suppose it depends on the goals, but here I kinda prefer leaning on that blocky raw API over the more convenient scalar-based wrappers. –  May 13 '15 at 11:06
-2

Since you can use c++11, you may use std::bind. It binds the function pointer with its arguments to a variable. The arguments can be placeholders and changed dynamically at runtime.

Like so:

double multiply( const double& x, const double& y )
{
    return x * y;
}

//Somewhere in your class
auto bound_fn = std::bind (multiply, 100, std::placeholders::_1);
bound_fn(5);  //calls fn(100,5), replacing _1 by the argument 5
gibertoni
  • 1,368
  • 12
  • 21
  • 2
    This definitely won't be any faster than the OP's code. – cmaster - reinstate monica Apr 24 '15 at 20:37
  • 1
    Lambda is generally preferred to bind for clarity. Also you're storing the result of the bind in an auto which requires type deduction. That can't happen if you don't do the bind when you build the library. You'd have to store the result of the bind (or lambda) in an std::function – evan Apr 24 '15 at 21:32