std::function performance as compared to templates

Question

I looked at another stack overflow question regarding std::function and why it is slow but I am still not convinced/do not understand. I ran the program from the question with a few modifications.

#include <iostream>
#include <functional>
#include <string>
#include <chrono>

template <typename F>
float calc1(F f) { return -1.0f * f(3.3f) + 666.0f; }

float calc2(const std::function<float (float)>& f) { return -1.0f * f(3.3f) + 666.0f; }
int main() {

    std::function<float (float)> f = [](float arg){ return arg * 0.5f; };
    for (int i = 0; i < 1e9; ++i) {
        // calc2(f);
        calc1([](float arg){ return arg * 0.5f; });
    }

    return 0;
}

With the templated version, the code runs in 4 seconds but with the std::function involved the runtime increases to 15 seconds. I understand why copying a std::function can be expensive, but here even with passing a reference, there seems to be no difference, could someone explain why this happens?

Just for reference this is the output when I type in g++ --version

Apple LLVM version 7.0.2 (clang-700.1.81)

Yep with optimizations turned on the regular code takes only 0.4 seconds, I assume nothing executes at all since there is no dependency elsewhere in the code, but with the std::function the runtime becomes 1.9seconds — Curious, Dec 21 '15 at 04:58
@NirFriedman: This is different. `std::function` is more than just a functor. It uses type erasure, which is a significant difference. — Cornstalks, Dec 21 '15 at 05:05
@Cornstalk the title is just misleading, the question asks about std::function. The accepted answer (mine, full disclosure) compares std:: function and lambda in depth, even looks at generated assembly. — Nir Friedman, Dec 21 '15 at 05:15
Could you guys please explain what is stopping the compiler from optimizing the code away at compile time? It seems like a function is usually resolved at compile time in most cases. I know I'm wrong I just wanted someone who knows more than me to explain this to me. Thanks! — Curious, Dec 21 '15 at 05:18
@NirFriedman: Ah, it seems you are correct there. I just read the title and the first code snippet of that other question and dismissed it as irrelevant. — Cornstalks, Dec 21 '15 at 05:21
@nir fix up the other post's title so it matches the content: with the current title, it is not a better question. Then I might vote in agreement. — Yakk - Adam Nevraumont, Dec 21 '15 at 05:50
@Yakk done, I tried to make the change as minimal as possible while clarifying; I think the phrase "bound lambda" is a bit confusing. — Nir Friedman, Dec 21 '15 at 06:29
@Cornstalks I didn't see you had already modified the title, thanks. My own edit had to be reviewed and naturally, was rejected; ah SO. — Nir Friedman, Dec 21 '15 at 07:05
@NirFriedman: Dang. Well at least now you've got >2k rep, so you shouldn't have that problem anymore :) — Cornstalks, Dec 22 '15 at 03:45

score 4 · Accepted Answer · edited May 23 '17 at 12:07

4

When I compile your program (with -O3 optimization) and use calc1, the execution time is 0.0 seconds. This is because the compiler can completely optimize away the code. It knows your code doesn't actually do anything, so there's no sense in running any of it.

When I compile your program (again with -O3 optimization) and use calc2 (which uses std::function), the program takes 2 seconds to run. The reason it takes longer is because the optimizer can't optimize everything away. std::function works at runtime (not compile time, because it has to do type erasure; see this question and this question), and in general the optimizer can't inline (or entirely optimize away) calls that go through std::function (in this situation, it's technically possible for the optimizer to do so, since this is a simple program, but it doesn't).

The reason std::function calls can't be inlined is because the compiler doesn't always know what std::function will do. In this code, it's simple enough that the compiler's static analyzer could, if it was "smart" enough, actually inline the whole thing and then optimize it away.

But that can be a tricky thing to implement in a compiler, and it doesn't make a very big difference in "real" programs that are more complex. In more complex programs, it can actually be impossible to know what std::function will do. For example, imagine you have a second .cpp file that calls calc2 with a different std::function. Or imagine if you set your std::function to one of two different lambdas, depending on user input. The compiler wouldn't know which lambda to actually call until the program ran, so it couldn't just optimize everything away. Because of issues like this, it's not really worth the effort of implementing deep static analysis for std::function that would completely optimize away your simple code.

edited May 23 '17 at 12:07

Community

1
1

answered Dec 21 '15 at 05:04

Cornstalks

37,137
18
79
144

Thank you! I looked at the second link and I still dont seem to understand. Why would the compiler not be able to optimize away the function call? – Curious Dec 21 '15 at 05:08
@Curious: I tried to add some expanded info. Let me know if you still have questions. – Cornstalks Dec 21 '15 at 05:20
@downvoter: I'm fine with you downvoting, but could you please leave a comment so we can all learn what's wrong/bad about my answer? – Cornstalks Dec 21 '15 at 05:22
I did not downvote at all! I actually really liked your answer :) – Curious Dec 21 '15 at 05:26
Also so from what you are saying, and from what I understand. It is in fact possible to optimize this at compile time but the static analysis just gets way too difficult to be considered as something to implement right? I do not know if this would be considered impossible in some situations – Curious Dec 21 '15 at 05:27
Something similar to how polymorphism is impossible to replicate at compile time. – Curious Dec 21 '15 at 05:28
Sorry, I down voted because it's a duplicate, and thus shouldn't be answered at all since it fragments knowledge. If you feel that you can add to the answers on the duplicate, you should answer there. For the record though there's nothing at all wrong with the answer itself. – Nir Friedman Dec 21 '15 at 05:28
@Curious: I know it wasn't you; someone else did and I wish they'd leave a comment so I can improve the answer :) Yes, you are right. It's totally within the realm of possibility for the compiler to better optimize your program. `std::function` actually uses polymorphism behind the scenes, so it's kind of the same problem. – Cornstalks Dec 21 '15 at 05:28
I removed the down vote fwiw. – Nir Friedman Dec 21 '15 at 05:30
@NirFriedman: Cool, thanks; I appreciate that explanation. I don't mind the downvote so much, it's a perfectly acceptable opinion. – Cornstalks Dec 21 '15 at 05:31
Also why would there be a need to use polymorphism in the implementation of std::function? I'm sorry I dont know why I am having such a hard time understanding this..... – Curious Dec 21 '15 at 05:49
@Curious: For the type erasure. Googling "c++ type erasure" will provide some good links, but I think [this blog post](https://davekilian.com/2014/10/24/cpp-type-erasure.html) is a pretty good introduction. The basic idea is that `std::function`'s constructor is a template function that takes some functor/lambda/function, and then internally stores it polymorphically (this process is called "type erasure"). Internally storing it polymorphically is needed because otherwise, the whole class would have to be templated (not just its constructor). – Cornstalks Dec 21 '15 at 06:09
@Curious: (continued) `std::function` is already templated, yes, but without type erasure there would be even more template types (and you would only be able to store one type of functor/lambda/function in `std::function`; but because of type erasure you can store any functor/lambda/function inside a `std::function` so long as its type signature matches the function call). – Cornstalks Dec 21 '15 at 06:11
@Curious It depends on compiler version and flags, but it can be optimized away.This optimization cannot always be done because of practical and theoretical reasons. Practically, the dynamic type may depend on information not available at compile time, e.g. user input or configuration. Theoretically, compilers would have to solve something equivalent to program equivalence or program line reachability, which is all undecidable. – Jens Dec 21 '15 at 08:01

std::function performance as compared to templates

1 Answers1