Avoid memory allocation with std::function and member function

Question

This code is just for illustrating the question.

#include <functional>
struct MyCallBack {
    void Fire() {
    }
};

int main()
{
    MyCallBack cb;
    std::function<void(void)> func = std::bind(&MyCallBack::Fire, &cb);
}

Experiments with valgrind shows that the line assigning to func dynamically allocates about 24 bytes with gcc 7.1.1 on linux.

In the real code, I have a few handfuls of different structs all with a void(void) member function that gets stored in ~10 million std::function<void(void)>.

Is there any way I can avoid memory being dynamically allocated when doing std::function<void(void)> func = std::bind(&MyCallBack::Fire, &cb); ? (Or otherwise assigning these member function to a std::function)

@BeyelerStudios, allocator support for functions has been dropped from 2017 C++. — SergeyA, Sep 11 '17 at 20:13
@BeyelerStudios Type erasure always has a runtime cost. If you want to avoid it, you can rewrite the function that consumes the `std::function` to be a template that takes an arbitrary callable type instead. This tradeoff between compile-time work and run-time work has always existed in C++. — Brian Bi, Sep 11 '17 at 20:20
Even beyond this specific case, where the given answer is correct, should just be avoiding bind almost entirely in C++14 and beyond. There's probably some corner case where it may still be ok, but in 99.9% of cases you should be using a lambda. — Nir Friedman, Sep 11 '17 at 20:22
The thing is, even with proposed solution of using a lambda that solves the immediate problem,the lambda/`std::function` is still storing a pointer to the struct, so the data will still be out-of-line. So if you have say a `vector>`, you will still trigger cache misses like crazy and have poor performance. You'll just have a single indirection instead of double indirection. If your various structs are all of similar sizes you can do much, much, better. Are they? And how many do you have? — Nir Friedman, Sep 11 '17 at 22:20

SergeyA · Accepted Answer · 2017-09-12T18:07:05.103

30

Unfortunately, allocators for std::function has been dropped in C++17.

Now the accepted solution to avoid dynamic allocations inside std::function is to use lambdas instead of std::bind. That does work, at least in GCC - it has enough static space to store the lambda in your case, but not enough space to store the binder object.

std::function<void()> func = [&cb]{ cb.Fire(); };
    // sizeof lambda is sizeof(MyCallBack*), which is small enough

As a general rule, with most implementations, and with a lambda which captures only a single pointer (or a reference), you will avoid dynamic allocations inside std::function with this technique (it is also generally better approach as other answer suggests).

Keep in mind, for that to work you need guarantee that this lambda will outlive the std::function. Obviously, it is not always possible, and sometime you have to capture state by (large) copy. If that happens, there is no way currently to eliminate dynamic allocations in functions, other than tinker with STL yourself (obviously, not recommended in general case, but could be done in some specific cases).

edited Sep 12 '17 at 18:07

answered Sep 11 '17 at 20:15

SergeyA

61,605
5
78
137

14

This is because `std::function` has an optimization that allocates the memory inside the object on the stack, provided that the function object has a small enough size. Using a `lambda` here will result in an object that is the size of one pointer, which *should* trigger the Small Function Optimization and allocate it within the `std::function` – Justin Sep 11 '17 at 20:17
@Justin, I am well aware of this, and even included it in my answer now :) – SergeyA Sep 11 '17 at 20:18
1

So the answer to my question to just do `std::function func = [&]{cb.Fire();};` ? – binary01 Sep 11 '17 at 20:30
1

@binary01, yep. – SergeyA Sep 11 '17 at 20:34
3

I'm not sure about the relevance of this answer. Yes, allocator support has been dropped. But that does not mean that `std::function` is prohibited from dynamically allocating memory using the default mechanism. "Dropping allocator support" simply means that you won't be able to customize the allocator. So, how is this answer relevant to the question? – AnT stands with Russia Sep 11 '17 at 21:00
3

@AnT, hm... My answer explicitly prescribes how OP can achieve their goal (get an `std::function<>` which doesn't allocate on GCC) and why there is no other way. How is that not relevant? – SergeyA Sep 11 '17 at 21:12
4

`function`'s dropping of allocator "support" in C++17 wasn't "unfortunate"; it was extremely fortunate, because it'll save anyone from trying to use it and finding out that no library vendor ever implemented it (because it's impossible). Kind of like how C++11 dropped "support" for `export` templates. :) – Quuxplusone Sep 11 '17 at 22:58
9

The OP might also be interested in a third-party replacement for `std::function` whose memory usage can be guaranteed (as opposed to dynamically allocating at a threshold that differs between libc++ and libstdc++). The keyword to search for is `inplace_function`, as in `sg14::inplace_function`. Then if you discover you need to store an even bigger lambda and still don't want to heap-allocate, you can just bump that template parameter from `24` to `40` or whatever, and recompile. – Quuxplusone Sep 11 '17 at 23:02
3

@Quuxplusone What's really nice about such a callable, is that unlike stack-storage strings/vectors etc, it's easy to catch at compile time when you are storing something too big in the container. – Nir Friedman Sep 11 '17 at 23:26
1

I am confused. Wouldn't a lambda pack its entire closure together with itself? How would it just be 1 pointer? – user541686 Sep 12 '17 at 00:29
1

@NirFriedman: Upvoted for agreement. :) @Mehrdad: The lambda's "closure" (its set of captures) *is* just a single pointer, namely, `&cb`. It doesn't need to capture anything else. I have [a CppCon talk](https://www.youtube.com/watch?v=WXeu4fj3zOs) on the subject of "how can a lambda be just ___". :) – Quuxplusone Sep 12 '17 at 01:33
1

Why does the lambda need to outlive the std::function object? Doesn't the lambda get copied into it? Or are you talking about the variable referenced by the closure? – MikeMB Sep 12 '17 at 07:35
@Quuxplusone: What? It's emphatically *not* a single pointer. There are 24 bytes [in this example](https://ideone.com/qiJmcN) just for the closure of 3 variables. That's 3 pointers right there. Are you talking about a particular example? – user541686 Sep 12 '17 at 09:06
@Mehrdad: [Yes, of course we're talking about a particular example.](https://stackoverflow.com/questions/46163607/avoid-memory-allocation-with-stdfunction-and-member-function/46163732?noredirect=1#comment79290222_46163732) This is supposed to be a thread about the OP's question, not, like, lambdas in general. (But feel free to ask a separate SO question about how lambdas work, or watch the talk linked above as it'll probably answer your questions.) – Quuxplusone Sep 12 '17 at 17:09
2

@Quuxplusone: Wait, but the OP even says *"This code is just for illustrating the question."*... that's obviously not his actual code. And this answer just generically says *"Now the accepted solution to avoid dynamic allocations inside std::function is to use lambdas instead of std::bind. That does work, at least in GCC - it has enough static space to store lambda, but not enough space to store binder object."* without giving any hint that this only applies to lambdas without a closure. I don't have any questions myself, I'm trying to say this answer as-is is making a false, broad claim. – user541686 Sep 12 '17 at 17:16
1

Edited. But I didn't edit the last paragraph, because I'm no longer sure what it was trying to say. (*"However, it works if you can guarantee that this lambda will outlive the std::function. Obviously, it is not always possible, and when it is not, there is no way currently to eliminate dynamic allocations in functions"*) I think by "this lambda" Sergey meant `cb`, and was trying to warn about dangling-reference issues, but that's completely orthogonal to the issue of dynamic allocation in `std::function`. The OP's `std::bind` code was not immune to dangling references. – Quuxplusone Sep 12 '17 at 17:24
@Quuxplusone, well, in this case lambda captures by reference (as well as his binder, of course). But I was trying to explain that technique has it's limitations. I will update the answer. Let me know if it is any better. – SergeyA Sep 12 '17 at 18:04

score 7 · Answer 2 · answered Sep 11 '17 at 20:32

7

As an addendum to the already existent and correct answer, consider the following:

MyCallBack cb;
std::cerr << sizeof(std::bind(&MyCallBack::Fire, &cb)) << "\n";
auto a = [&] { cb.Fire(); };
std::cerr << sizeof(a);

This program prints 24 and 8 for me, with both gcc and clang. I don't exactly know what bind is doing here (my understanding is that it's a fantastically complicated beast), but as you can see, it's almost absurdly inefficient here compared to a lambda.

As it happens, std::function is guaranteed to not allocate if constructed from a function pointer, which is also one word in size. So constructing a std::function from this kind of lambda, which only needs to capture a pointer to an object and should also be one word, should in practice never allocate.

answered Sep 11 '17 at 20:32

Nir Friedman

17,108
2
44
72

4

I would expect `bind` to store pointer to member - 16 bytes + pointer to object - 8 bytes. Here you have your 24. – SergeyA Sep 11 '17 at 20:37
Looks like the state of the lambda capture is comprised of a reference to `cb` only. Hence, 8 bytes. – Maxim Egorushkin Sep 11 '17 at 20:45
@SergeyA Ah, I forget that pointer to member are 16 bytes, that's why I had trouble accounting for 24. I hate just about everything about pointers to members. – Nir Friedman Sep 11 '17 at 20:57
2

I don't think the fact that the lambda object has the same size as a function pointer is relevant. If you construct your std::function from a pointer, then it only needs that 8 byte of internal state. If you create it from a lambda, the object has to store a copy of the lambda AND a function pointer (due to type Erasure). That being said, your statement that most implementations will not make an allocations for such small objects is true. – MikeMB Sep 12 '17 at 04:34
@SergeyA: Why would the pointer-to-member be 16 bytes? I would expect it could fit in 8 bytes (regular pointer-to-function size); is there something specific to account for possibly virtual functions without the help of a trampoline? – Matthieu M. Sep 12 '17 at 07:11
@MikeMB: I agree with you about the amount of storage required (captured pointer to object + pointer to function/lambda), and thus I can only guess that the implementation of `std::function` the OP has can store 16 bytes inline, not only 8. – Matthieu M. Sep 12 '17 at 07:13
@MatthieuM.: You can get an upper bound of the inline storage in different libraries by applying the sizeof operator: https://godbolt.org/g/yhYDkX. After subtracting 8 Bytes for the pointer it leaves 24bytes in libstdc++, 40 bytes in libc++ and 56 bytes in MSVC. I don't know however, if std::function has additional internal variables or if alignment is an issue here, so that the effective spare capacity might be even smaller. – MikeMB Sep 12 '17 at 07:40
1

@MikeMB: When I last checked the libstdc++ implementation of string, which also has a "short string optimization", `sizeof(string)` was 24 bytes, but only 15 (+trailing NUL) could be stored inline (even though folly manages to store 23+NUL inline). So `sizeof` is indeed an upper bound, and unlikely to be met. – Matthieu M. Sep 12 '17 at 10:30
@MikeMB I don't think that's true. Whether you use inheritance or a function pointer, the top level indirection in `std::function` is basically a function pointer taking Args... + the closure type. So you need to store that function pointer (or a pointer to it, in the case of inheritance), + the actual function pointer. Basically this is directly related to the fact that when you construct `std::function` from a function pointer, there is no compliant implementation that avoids double indirection that I'm aware of. Happy to hear how you would propose to do it. – Nir Friedman Sep 12 '17 at 13:36
@MatthieuM. See my response above. Also, member function pointer has to be 16 bytes; it's not related to virtual function, but rather potential adjustments to `this` pointer when you have multiple inheritance: http://lazarenko.me/wide-pointers/ – Nir Friedman Sep 12 '17 at 13:41
@NirFriedman: This is why I was talking about a trampoline. That is, it is possible instead of using a tuple of (pointer-to-function, adjustment) to instead compile a *separate* function whose first instruction is performing the adjustment then calling the regular function. It's a deliberate choice on the part of the Itanium ABI to choose one implementation strategy rather than another, but it is certainly *not mandatory*. I expect they considered a range of designs and thought this one best, notably because it avoids creating said trampoline for every single method. – Matthieu M. Sep 12 '17 at 13:58
@MatthieuM. Okay, I wasn't sure what you meant by trampoline in this context. I'm not sure how your solution would work in the face of also needing to support equality: http://coliru.stacked-crooked.com/a/e720bade92b834b2. The same member function has to compare equal, even if the offset is different. With trampolines you would just have two different function pointers; there'd be no easy way to see that they both trampoline into the same thing that I can see. – Nir Friedman Sep 12 '17 at 14:10
@NirFriedman: I think you are right and I was confused. As the type doesn't chagne between a std::function holding a function pointer and one holding a function object, you always need at least space for two pointers: (one function pointer and one pointing to data, even if the latter is a nullptr). Sorry for the fuss. – MikeMB Sep 12 '17 at 15:07
@NirFriedman: I had not considered equality, multiple trampolines could be made to compare equal, but not as cheaply as comparing pointers obviously. Further nail in the coffin for the trampoline solution :D – Matthieu M. Sep 12 '17 at 15:13
Bind basically just gloms stuff together and creates a wrapper to call it. A "pointer to member" is twice the size of a normal pointer and you are binding "this" which is another pointer, so the result of the bind is 3 pointers in size. The lambda on the other hand refers to the member function statically so it only has to store the "this" pointer. – plugwash Nov 18 '19 at 17:07

score 4 · Answer 3 · answered Jul 16 '19 at 01:36

Run this little hack and it probably will print the amount of bytes you can capture without allocating memory:

#include <iostream>
#include <functional>
#include <cstring>

void h(std::function<void(void*)>&& f, void* g)
{
  f(g);
}

template<size_t number_of_size_t>
void do_test()
{
  size_t a[number_of_size_t];
  std::memset(a, 0, sizeof(a));
  a[0] = sizeof(a);

  std::function<void(void*)> g = [a](void* ptr) {
    if (&a != ptr)
      std::cout << "malloc was called when capturing " << a[0] << " bytes." << std::endl;
    else
      std::cout << "No allocation took place when capturing " << a[0] << " bytes." << std::endl;
  };

  h(std::move(g), &g);
}

int main()
{
  do_test<1>();
  do_test<2>();
  do_test<3>();
  do_test<4>();
}

With gcc version 8.3.0 this prints

No allocation took place when capturing 8 bytes.
No allocation took place when capturing 16 bytes.
malloc was called when capturing 24 bytes.
malloc was called when capturing 32 bytes.

Neat. Btw, you can use `size_t a[number_of_size_t]{};` to do [zero initialization](https://en.cppreference.com/w/cpp/language/zero_initialization), rather than calling `std::memset`. Or just not initialize the array. You only read the first element, so that's the only one that needs to be initialized. — Daniel Stevens, Dec 06 '21 at 13:48

score 3 · Answer 4 · answered Sep 11 '17 at 20:53

Many std::function implementations will avoid allocations and use space inside the function class itself rather than allocating if the callback it wraps is "small enough" and has trivial copying. However, the standard does not require this, only suggests it.

On g++, a non-trivial copy constructor on a function object, or data exceeding 16 bytes, is enough to cause it to allocate. But if your function object has no data and uses the builtin copy constructor, then std::function won't allocate. Also, if you use a function pointer or a member function pointer, it won't allocate.

While not directly part of your question, it is part of your example. Do not use std::bind. In virtually every case, a lambda is better: smaller, better inlining, can avoid allocations, better error messages, faster compiles, the list goes on. If you want to avoid allocations, you must also avoid bind.

bolov · Answer 5 · 2017-09-12T07:15:54.593

I propose a custom class for your specific usage.

While it's true that you shouldn't try to re-implement existing library functionality because the library ones will be much more tested and optimized, it's also true that it applies for the general case. If you have a particular situation like in your example and the standard implementation doesn't suite your needs you can explore implementing a version tailored to your specific use case, which you can measure and tweak as necessary.

So I have created a class akin to std::function<void (void)> that works only for methods and has all the storage in place (no dynamic allocations).

I have lovingly called it Trigger (inspired by your Fire method name). Please do give it a more suited name if you want to.

// helper alias for method
// can be used in user code
template <class T>
using Trigger_method = auto (T::*)() -> void;

namespace detail
{

// Polymorphic classes needed for type erasure
struct Trigger_base
{
    virtual ~Trigger_base() noexcept = default;
    virtual auto placement_clone(void* buffer) const noexcept -> Trigger_base* = 0;

    virtual auto call() -> void = 0;
};

template <class T>
struct Trigger_actual : Trigger_base
{
    T& obj;
    Trigger_method<T> method;

    Trigger_actual(T& obj, Trigger_method<T> method) noexcept : obj{obj}, method{method}
    {
    }

    auto placement_clone(void* buffer) const noexcept -> Trigger_base* override
    {
        return new (buffer) Trigger_actual{obj, method};
    }

    auto call() -> void override
    {
        return (obj.*method)();
    }
};

// in Trigger (bellow) we need to allocate enough storage
// for any Trigger_actual template instantiation
// since all templates basically contain 2 pointers
// we assume (and test it with static_asserts)
// that all will have the same size
// we will use Trigger_actual<Trigger_test_size>
// to determine the size of all Trigger_actual templates
struct Trigger_test_size {};

}

struct Trigger
{
    std::aligned_storage_t<sizeof(detail::Trigger_actual<detail::Trigger_test_size>)>
        trigger_actual_storage_;

    // vital. We cannot just cast `&trigger_actual_storage_` to `Trigger_base*`
    // because there is no guarantee by the standard that
    // the base pointer will point to the start of the derived object
    // so we need to store separately  the base pointer
    detail::Trigger_base* base_ptr = nullptr;

    template <class X>
    Trigger(X& x, Trigger_method<X> method) noexcept
    {
        static_assert(sizeof(trigger_actual_storage_) >= 
                         sizeof(detail::Trigger_actual<X>));
        static_assert(alignof(decltype(trigger_actual_storage_)) %
                         alignof(detail::Trigger_actual<X>) == 0);

        base_ptr = new (&trigger_actual_storage_) detail::Trigger_actual<X>{x, method};
    }

    Trigger(const Trigger& other) noexcept
    {
        if (other.base_ptr)
        {
            base_ptr = other.base_ptr->placement_clone(&trigger_actual_storage_);
        }
    }

    auto operator=(const Trigger& other) noexcept -> Trigger&
    {
        destroy_actual();

        if (other.base_ptr)
        {
            base_ptr = other.base_ptr->placement_clone(&trigger_actual_storage_);
        }

        return *this;
    }

    ~Trigger() noexcept
    {
        destroy_actual();
    }

    auto destroy_actual() noexcept -> void
    {
        if (base_ptr)
        {
            base_ptr->~Trigger_base();
            base_ptr = nullptr;
        }
    }

    auto operator()() const
    {
        if (!base_ptr)
        {
            // deal with this situation (error or just ignore and return)
        }

        base_ptr->call();
    }
};

Usage:

struct X
{    
    auto foo() -> void;
};


auto test()
{
    X x;

    Trigger f{x, &X::foo};

    f();
}

Warning: only tested for compilation errors.

You need to thoroughly test it for correctness.

You need to profile it and see if it has a better performance than other solutions. The advantage of this is because it's in house cooked you can make tweaks to the implementation to increase performance on your specific scenarios.

This code is correct as far as it goes. But if you pulled the expression `sizeof(detail::Trigger_actual)` out into a default value for a template non-type parameter `size_t Size = that-expression`, and also provided a template non-type parameter for `Alignment` and a type parameter for the desired signature (not hard-coding `void(void)`), then you'd have reinvented [`sg14::inplace_function`](https://github.com/WG21-SG14/SG14/blob/master/SG14/inplace_function.h) except with no tests and it only works for member functions. :) I suggest just using `inplace_function`. — Quuxplusone, Sep 12 '17 at 17:33
@Quuxplusone I didn't know about `sg14::inplace_function`. Thank you for the link — bolov, Sep 13 '17 at 10:43

score 0 · Answer 6 · answered Aug 14 '22 at 17:06

As @Quuxplusone mentioned in their answer-as-a-comment, you can use inplace_function here. Include the header in your project, and then use like this:

#include "inplace_function.h"

struct big { char foo[20]; };

static stdext::inplace_function<void(), 8> inplacefunc;
static std::function<void()> stdfunc;

int main() {
  static_assert(sizeof(inplacefunc) == 16);
  static_assert(sizeof(stdfunc) == 32);

  inplacefunc = []() {};
  // fine

  struct big a;
  inplacefunc = [a]() {};
  // test.cpp:15:24:   required from here
  // inplace_function.h:237:33: error: static assertion failed: inplace_function cannot be constructed from object with this (large) size
  //  237 |         static_assert(sizeof(C) <= Capacity,
  //      |                       ~~~~~~~~~~^~~~~~~~~~~
  // inplace_function.h:237:33: note: the comparison reduces to ‘(20 <= 8)’
}

cc @quuxplusone if you have anything else you'd like to contribute here :) inplace_function.h has worked great for my application, thanks for pointing it out! — flaviut, Aug 14 '22 at 17:08

Avoid memory allocation with std::function and member function

6 Answers6

Linked