0

I know that std::function is implemented with the type erasure idiom. Type erasure is a handy technique, but as a drawback it needs to store on the heap a register (some kind of array) of the underlying objects.

Hence when creating or copying a function object there are allocations to do, and as a consequence the process should be slower than simply manipulating functions as template types.

To check this assumption I have run a test function that accumulates n = cycles consecutive integers, and then divides the sum by the number of increments n. First coded as a template:

#include <iostream>
#include <functional>
#include <chrono>
using std::cout;
using std::function;
using std::chrono::system_clock;
using std::chrono::duration_cast;
using std::chrono::milliseconds;

double computeMean(const double start, const int cycles) {
    double tmp(start);
    for (int i = 0; i < cycles; ++i) {
        tmp += i;
    }
    return tmp / cycles;
}

template<class T>
double operate(const double a, const int b, T myFunc) {
    return myFunc(a, b);
}

and the main.cpp:

int main()
{
    double init(1), result;
    int increments(1E9);
    // start clock
    system_clock::time_point t1 = system_clock::now();

    result = operate(init, increments, computeMean);
    // stop clock
    system_clock::time_point t2 = system_clock::now();

    cout << "Input: " << init << ", " << increments << ", Output: " << result << '\n';
    cout << "Time elapsed: " << duration_cast<milliseconds>(t2 - t1).count() << " ms\n";
    return 0;
}

This was run a hundred times and get a mean result of 10024.9 ms.

Then I introduce the function object in the main, plus a template specialization for operate so the code above can be recycled:

// as above, just add the template specialization
template<>
double operate(const double a, const int b, function<double (const double, const int)> myFunc) {
    cout << "nontemplate called\n";
    return myFunc(a, b);
}

// and inside the main
int main()
{
    //...
    // start clock
    system_clock::time_point t1 = system_clock::now();

    // new lines
    function<double (const double, const int)> computeMean =
        [](const double init, const int increments) {
            double tmp(init);
            for (int i = 0; i < increments; ++i) {
                tmp += i;
            }
            return tmp / increments;
        };
    // rest as before
    // ...
}

I expected the function version to be faster, but the average is about the same, actually even slower, result = 9820.3 ms. Checked the standard deviations and they are about the same, 1233.77 against 1234.96.

What sense can be made of this? I would have expected the second version with the function object to be slower than the template version.

Here the whole test can be run on GDB.

Giogre
  • 1,444
  • 7
  • 19
  • 3
    How did you compile your program? Especially, what optimizations are enabled? A smart optimizer could transform your code to render the difference moot, and no optimizations tells us nothing on performance. – YSC May 17 '22 at 18:36
  • I used `-O2`. Of course there will compiler optimizations involved, I wanted to mention it in the main question but then forgot. – Giogre May 17 '22 at 18:40
  • 1
    Check out the assembly generated by your two programs. They could be the same. – YSC May 17 '22 at 18:44
  • *What sense can be made of this?* My first guess is: your assumption was incorrect. – Eljay May 17 '22 at 18:48

1 Answers1

5

I know that std::function is implemented with the type erasure idiom. Type erasure is a handy technique, but as a drawback it needs to store on the heap a register (some kind of array) of the underlying objects.

Type erasure does not necessarily require heap allocations. In this case, it is likely the implementation of std::function will not have to do any heap allocation, since the lambda doesn't capture any variables. Therefore, std::function only has to store the function pointer, which it will do in the object itself, not in heap-allocated memory.

Apart from that, even if std::function did do a heap allocation, some compilers might even elide those heap allocations.

Last but not least, while heap allocations are more expensive than stack allocations, if you only need to allocate something on the heap once for the entire duration of your program, you probably won't notice any difference in timing due to that allocation.

G. Sliepen
  • 7,637
  • 1
  • 15
  • 31
  • 2
    GCC's implementation of ```std::function``` does not require a heap allocation for callables up to 16 bytes in size (on x86-64), it uses an internal pre-allocated buffer instead. [Rough sketch on how it's implemented in GCC](https://stackoverflow.com/questions/72214886/does-stdfunction-have-virtual-functions-for-copying-and-moving/72215967#72215967) – Jonathan S. May 17 '22 at 18:51
  • So, bottom line: either due to compiler optimizations, or minimal impact of heap allocations (if any at all in case the callable is small), `std::function` is as quick as template calls? – Giogre May 17 '22 at 19:55
  • 2
    @Giogre it would be more accurate to say it *can* be as fast. There is no guarantee that that is always the case. – Kyle May 17 '22 at 20:02
  • 1
    Interesting. I've seen a lot of people say '`std::function` is expensive' on this site. Well, just goes to show, don't believe everything you read. – Paul Sanders May 17 '22 at 22:07