Is there a way to return an abstraction from a function without using new (for performance reasons)

Question

For example I have some function pet_maker() that creates and returns a Cat or a Dog as a base Pet. I want to call this function many many times, and do something with the Pet returned.

Traditionally I would new the Cat or Dog in pet_maker() and return a pointer to it, however the new call is much slower than doing everything on the stack.

Is there a neat way anyone can think of to return as an abstraction without having to do the new every time the function is called, or is there some other way that I can quickly create and return abstractions?

@Arunmu If I knew how to write the code that showed what I intend to do, I wouldn't have to ask the question. — sji, Jul 11 '16 at 11:05
When using static or stack allocation, you can't create objects in a polymorphic way because the compiler needs to know the exact size. So no, it isn't possible. — adnan_e, Jul 11 '16 at 11:07
@sji :) that's why I asked for the `intended` code. The part which is not clear is whether you want to add a base class called `Animal` or something... providing a pseudo code of how you want it to work would help. — Arunmu, Jul 11 '16 at 11:07
You could have a factory class which has two members (Cat, Dog) and then re-uses these objects on every creation call to properly configure them and return a copy. This avoids heap allocation. Obviously, it assumes all Pet objects are CopyConstructible. — dgrine, Jul 11 '16 at 11:12
To add to comment of Armen, you could have a look at placement new, for example: http://stackoverflow.com/questions/222557/what-uses-are-there-for-placement-new — Joris, Jul 11 '16 at 11:12
Out of the top of my mind what I can think of is to create a static `stack allocator` instance (with some max limit of course) and pass that as an argument to your `pet_maker ` function. Then instead of regular `new` do a `placement new` on the address provided by the `stack allocator` — Arunmu, Jul 11 '16 at 11:12
I think remaking function to take empty object as reference, assigning to it new object created on stack (assign operator) would be faster, but it might depend on actual class; — MaciekGrynda, Jul 11 '16 at 11:12
@Galik I do not know how many might be needed unfortunately. — sji, Jul 11 '16 at 11:14
@sji If you knew you could pre-allocate memory for them as 1 chunk. However, that's not much of a limitation - you can allocate chunk for N objects and if you run out of that - allocate either chunks of the same size or chunks of progressively bigger sizes (possibly with moving your objects to a new, bigger chunk each time). Take look at how `vector` works for one example. — Daerdemandt, Jul 11 '16 at 11:47
How do you -- assuming this was possible -- return a stack-allocated object anyway? It is destroyed when leaving the scope (return), so you're referencing a dead object. — Damon, Jul 12 '16 at 08:13

score 20 · Accepted Answer · answered Jul 11 '16 at 11:11

20

Using new is pretty much inevitable if you want polymorphism. But the reason new works slowly is because it looks for free memory every time. What you could do is write your own operator new, which could, in theory, for example use pre-allocated memory chunks and be very fast.

This article covers many aspects of what you might need.

answered Jul 11 '16 at 11:11

Armen Tsirunyan

130,161
59
324
434

4

To me this seems like the only proper solution. You can also look at meyers effective c++ for the `Airplane` example, which is probably simpler than what is presented in the linked page. – Paul Rooney Jul 11 '16 at 11:22
1

This I think is exactly what I was hoping for. Thanks. – sji Jul 11 '16 at 11:25
8

Just a hint: you accepted an answer almost immediately after posting your question, according to the timestamps. You might want to wait a bit with accepting, better answers may pop up (i.e., the batch allocation method from @Galik in another answer). I don't know what you are _actually_ intending to achieve (surely not literally `Cat` and `Dog` ...); and in most cases it is better to try for some high level optimization first before going nitty gritty into special memory allocation stuff (you have to maintain and debug that code as well, maybe also in 3 years from now...). – AnoE Jul 11 '16 at 13:55
3

Introducing caching/pooling into `pet_maker()` likely makes more sense than writing your own `operator new`... Depends on the actual use-case, I suppose. – BlueRaja - Danny Pflughoeft Jul 11 '16 at 15:29
@sji How do you know? There is no rush to accepting answers. By accepting an answer early you are actively discouraging other answerers and viewers. You can just upvote the answer now and come back in one or two days to see if anything better has come up and accept the one you regard as best. – Bakuriu Jul 11 '16 at 17:02

Galik · Answer 2 · 2016-07-11T15:36:27.167

Each allocation is an overhead so you may get benefits by allocating whole arrays of objects rather than one object at a time.

You could use std::deque to achieve this:

class Pet { public: virtual ~Pet() {} virtual std::string talk() const = 0; };
class Cat: public Pet { std::string talk() const override { return "meow"; }};
class Dog: public Pet { std::string talk() const override { return "woof"; }};
class Pig: public Pet { std::string talk() const override { return "oink"; }};

class PetMaker
{
    // std::deque never re-allocates when adding
    // elements which is important when distributing
    // pointers to the elements
    std::deque<Cat> cats;
    std::deque<Dog> dogs;
    std::deque<Pig> pigs;

public:

    Pet* make()
    {
        switch(std::rand() % 3)
        {
            case 0:
                cats.emplace_back();
                return &cats.back();
            case 1:
                dogs.emplace_back();
                return &dogs.back();
        }
        pigs.emplace_back();
        return &pigs.back();
    }
};

int main()
{
    std::srand(std::time(0));

    PetMaker maker;

    std::vector<Pet*> pets;

    for(auto i = 0; i < 100; ++i)
        pets.push_back(maker.make());

    for(auto pet: pets)
        std::cout << pet->talk() << '\n';
}

The reason to use a std::deque is that it never reallocates its elements when you add new ones so the pointers that you distribute always remain valid until the PetMaker itself is deleted.

An added benefit to this over allocating objects individually is that they don't need to be deleted or placed in a smart pointer, the std::deque manages their lifetime.

The drawback over allocating objects individually is that they hang around until the PetMaker destructor call, so you're burning memory relative to tracking lifespan of the objects. Perhaps return smart pointers to objects that tell the PetMaker instance when they go out of scope, allowing reuse of their resources? — Jon Chesterfield, Jul 11 '16 at 16:13
@JonChesterfield In the *use case* for this your drawback is actually its advantage. One way to think about it is that the `PetMaker` operates as a kind of a *smart pointer* for a *whole collection* of objects. The need here is efficiency of creation but allocating every object and putting them all in individual *smart pointers* would loose all the speed benefits this solution is designed to provide. Also, if you were going to create every object individually and place them in a *smart pointer*, you wouldn't really need to use this class to begin with. — Galik, Jul 11 '16 at 16:28
@JonChesterfield This should not waste memory if the caller asks for the number of objects it needs from a `PetMaker` that is placed only in the scope for which they are needed. — Galik, Jul 11 '16 at 16:31
sure, careful control over the scope of PetMaker would eliminate the memory overhead from keeping objects alive for longer than necessary. I like your implementation, merely pointing out that arena memory management is not unconditionally superior to handing out smart pointers. — Jon Chesterfield, Jul 11 '16 at 16:38

Matthieu M. · Answer 3 · 2016-07-12T07:50:40.890

Is there a neat way anyone can think of to return as an abstraction without having to do the new every time the function is called, or is there some other way that I can quickly create and return abstractions?

TL;DR: The function need not allocate if there is already sufficient memory to work with.

A simple way would be to create a smart pointer that is slightly different from its siblings: it would contain a buffer in which it would store the object. We can even make it non-nullable!

Long version:

I'll present the rough draft in reverse order, from the motivation to the tricky details:

class Pet {
public:
    virtual ~Pet() {}

    virtual void say() = 0;
};

class Cat: public Pet {
public:
    virtual void say() override { std::cout << "Miaou\n"; }
};

class Dog: public Pet {
public:
    virtual void say() override { std::cout << "Woof\n"; }
};

template <>
struct polymorphic_value_memory<Pet> {
    static size_t const capacity = sizeof(Dog);
    static size_t const alignment = alignof(Dog);
};

typedef polymorphic_value<Pet> any_pet;

any_pet pet_factory(std::string const& name) {
    if (name == "Cat") { return any_pet::build<Cat>(); }
    if (name == "Dog") { return any_pet::build<Dog>(); }

    throw std::runtime_error("Unknown pet name");
}

int main() {
    any_pet pet = pet_factory("Cat");
    pet->say();
    pet = pet_factory("Dog");
    pet->say();
    pet = pet_factory("Cat");
    pet->say();
}

The expected output:

Miaou
Woof
Miaou

which you can find here.

Note that it is required to specify the maximum size and alignment of the derived values that can be supported. No way around that.

Of course, we statically check whether the caller would attempt to build a value with an inappropriate type to avoid any unpleasantness.

The main disadvantage, of course, is that it must be at least as big (and aligned) as its largest variant, and all this must be predicted ahead of time. This is thus not a silver bullet, but performance-wise the absence of memory-allocation can rock.

How does it work? Using this high-level class (and the helper):

//  To be specialized for each base class:
//  - provide capacity member (size_t)
//  - provide alignment member (size_t)
template <typename> struct polymorphic_value_memory;

template <typename T,
          typename CA = CopyAssignableTag,
          typename CC = CopyConstructibleTag,
          typename MA = MoveAssignableTag,
          typename MC = MoveConstructibleTag>
class polymorphic_value {
    static size_t const capacity = polymorphic_value_memory<T>::capacity;
    static size_t const alignment = polymorphic_value_memory<T>::alignment;

    static bool const move_constructible = std::is_same<MC, MoveConstructibleTag>::value;
    static bool const move_assignable = std::is_same<MA, MoveAssignableTag>::value;
    static bool const copy_constructible = std::is_same<CC, CopyConstructibleTag>::value;
    static bool const copy_assignable = std::is_same<CA, CopyAssignableTag>::value;

    typedef typename std::aligned_storage<capacity, alignment>::type storage_type;

public:
    template <typename U, typename... Args>
    static polymorphic_value build(Args&&... args) {
        static_assert(
            sizeof(U) <= capacity,
            "Cannot host such a large type."
        );

        static_assert(
            alignof(U) <= alignment,
            "Cannot host such a largely aligned type."
        );

        polymorphic_value result{NoneTag{}};
        result.m_vtable = &build_vtable<T, U, MC, CC, MA, CA>();
        new (result.get_ptr()) U(std::forward<Args>(args)...);
        return result;
    }

    polymorphic_value(polymorphic_value&& other): m_vtable(other.m_vtable), m_storage() {
        static_assert(
            move_constructible,
            "Cannot move construct this value."
        );

        (*m_vtable->move_construct)(&other.m_storage, &m_storage);

        m_vtable = other.m_vtable;
    }

    polymorphic_value& operator=(polymorphic_value&& other) {
        static_assert(
            move_assignable || move_constructible,
            "Cannot move assign this value."
        );

        if (move_assignable && m_vtable == other.m_vtable)
        {
            (*m_vtable->move_assign)(&other.m_storage, &m_storage);
        }
        else
        {
            (*m_vtable->destroy)(&m_storage);

            m_vtable = other.m_vtable;
            (*m_vtable->move_construct)(&other.m_storage, &m_storage);
        }

        return *this;
    }

    polymorphic_value(polymorphic_value const& other): m_vtable(other.m_vtable), m_storage() {
        static_assert(
            copy_constructible,
            "Cannot copy construct this value."
        );

        (*m_vtable->copy_construct)(&other.m_storage, &m_storage);
    }

    polymorphic_value& operator=(polymorphic_value const& other) {
        static_assert(
            copy_assignable || (copy_constructible && move_constructible),
            "Cannot copy assign this value."
        );

        if (copy_assignable && m_vtable == other.m_vtable)
        {
            (*m_vtable->copy_assign)(&other.m_storage, &m_storage);
            return *this;
        }

        //  Exception safety
        storage_type tmp;
        (*other.m_vtable->copy_construct)(&other.m_storage, &tmp);

        if (move_assignable && m_vtable == other.m_vtable)
        {
            (*m_vtable->move_assign)(&tmp, &m_storage);
        }
        else
        {
            (*m_vtable->destroy)(&m_storage);

            m_vtable = other.m_vtable;
            (*m_vtable->move_construct)(&tmp, &m_storage);
        }

        return *this;
    }

    ~polymorphic_value() { (*m_vtable->destroy)(&m_storage); }

    T& get() { return *this->get_ptr(); }
    T const& get() const { return *this->get_ptr(); }

    T* operator->() { return this->get_ptr(); }
    T const* operator->() const { return this->get_ptr(); }

    T& operator*() { return this->get(); }
    T const& operator*() const { return this->get(); }

private:
    polymorphic_value(NoneTag): m_vtable(0), m_storage() {}

    T* get_ptr() { return reinterpret_cast<T*>(&m_storage); }
    T const* get_ptr() const { return reinterpret_cast<T const*>(&m_storage); }

    polymorphic_value_vtable const* m_vtable;
    storage_type m_storage;
}; // class polymorphic_value

Essentially, this is just like any STL container. The bulk of the complexity is in redefining the construction, move, copy and destruction. It's otherwise quite simple.

There are two points of note:

I use a tag-based approach to handling capabilities:
- for example, a copy constructor is only available if the CopyConstructibleTag is passed
- if the CopyConstructibleTag is passed, all types passed to build must be copy constructible
Some operations are provided even if the objects do not have the capability, as long as some alternative way of providing them exist

Obviously, all methods preserve the invariant that the polymorphic_value is never empty.

There is also a tricky detail related to assignments: assignment is only well-defined if both objects are of the same dynamic type, which we check with the m_vtable == other.m_vtable checks.

For completeness, the missing pieces used to power up this class:

//
//  VTable, with nullable methods for run-time detection of capabilities
//
struct NoneTag {};
struct MoveConstructibleTag {};
struct CopyConstructibleTag {};
struct MoveAssignableTag {};
struct CopyAssignableTag {};

struct polymorphic_value_vtable {
    typedef void (*move_construct_type)(void* src, void* dst);
    typedef void (*copy_construct_type)(void const* src, void* dst);
    typedef void (*move_assign_type)(void* src, void* dst);
    typedef void (*copy_assign_type)(void const* src, void* dst);
    typedef void (*destroy_type)(void* dst);

    move_construct_type move_construct;
    copy_construct_type copy_construct;
    move_assign_type move_assign;
    copy_assign_type copy_assign;
    destroy_type destroy;
};


template <typename Base, typename Derived>
void core_move_construct_function(void* src, void* dst) {
    Derived* derived = reinterpret_cast<Derived*>(src);
    new (reinterpret_cast<Base*>(dst)) Derived(std::move(*derived));
} // core_move_construct_function

template <typename Base, typename Derived>
void core_copy_construct_function(void const* src, void* dst) {
    Derived const* derived = reinterpret_cast<Derived const*>(src);
    new (reinterpret_cast<Base*>(dst)) Derived(*derived);
} // core_copy_construct_function

template <typename Derived>
void core_move_assign_function(void* src, void* dst) {
    Derived* source = reinterpret_cast<Derived*>(src);
    Derived* destination = reinterpret_cast<Derived*>(dst);
    *destination = std::move(*source);
} // core_move_assign_function

template <typename Derived>
void core_copy_assign_function(void const* src, void* dst) {
    Derived const* source = reinterpret_cast<Derived const*>(src);
    Derived* destination = reinterpret_cast<Derived*>(dst);
    *destination = *source;
} // core_copy_assign_function

template <typename Derived>
void core_destroy_function(void* dst) {
    Derived* d = reinterpret_cast<Derived*>(dst);
    d->~Derived();
} // core_destroy_function


template <typename Tag, typename Base, typename Derived>
typename std::enable_if<
    std::is_same<Tag, MoveConstructibleTag>::value,
    polymorphic_value_vtable::move_construct_type
>::type 
build_move_construct_function()
{
    return &core_move_construct_function<Base, Derived>;
} // build_move_construct_function

template <typename Tag, typename Base, typename Derived>
typename std::enable_if<
    std::is_same<Tag, CopyConstructibleTag>::value,
    polymorphic_value_vtable::copy_construct_type
>::type 
build_copy_construct_function()
{
    return &core_copy_construct_function<Base, Derived>;
} // build_copy_construct_function

template <typename Tag, typename Derived>
typename std::enable_if<
    std::is_same<Tag, MoveAssignableTag>::value,
    polymorphic_value_vtable::move_assign_type
>::type 
build_move_assign_function()
{
    return &core_move_assign_function<Derived>;
} // build_move_assign_function

template <typename Tag, typename Derived>
typename std::enable_if<
    std::is_same<Tag, CopyAssignableTag>::value,
    polymorphic_value_vtable::copy_construct_type
>::type 
build_copy_assign_function()
{
    return &core_copy_assign_function<Derived>;
} // build_copy_assign_function


template <typename Base, typename Derived,
          typename MC, typename CC,
          typename MA, typename CA>
polymorphic_value_vtable const& build_vtable() {
    static polymorphic_value_vtable const V = {
        build_move_construct_function<MC, Base, Derived>(),
        build_copy_construct_function<CC, Base, Derived>(),
        build_move_assign_function<MA, Derived>(),
        build_copy_assign_function<CA, Derived>(),
        &core_destroy_function<Derived>
    };
    return V;
} // build_vtable

The one trick I use here is to let the user configure whether the types he will use in this container can be move constructed, move assigned, ... via capability tags. A number of operations are keyed on these tags and will either be disabled or less efficient if the requested capability

There's some sophisticated C++ hiding in the details here, thank you for sharing. — Jon Chesterfield, Jul 11 '16 at 16:28
@JonChesterfield: Yeah, I started it as "let's write this code on the fly" and then as the details kept piling on it became "better write all this on coliru, it'll never compile the first time". And it didn't. On the other hand, it worked the first time it compiled, making me glad I went the "compile-time-checked" road rather than the "run-time-checked" road I had initially envisaged :) — Matthieu M., Jul 11 '16 at 17:06

score 6 · Answer 4 · edited Jul 11 '16 at 15:05

6

You can create a stack allocator instance (with some max limit of course) and pass that as an argument to your pet_maker function. Then instead of regular new do a placement new on the address provided by the stack allocator.

You can probably also default to new on exceeding max_size of the stack allocator.

edited Jul 11 '16 at 15:05

Lightness Races in Orbit

378,754
76
643
1,055

answered Jul 11 '16 at 11:16

Arunmu

6,837
1
24
46

Sounds promising, it seems that the stack allocator is something that doesn't come for free coding wise (ie I have to create one/copy one, there isn't a std one I could use?). Just reading/googling about stack allocators now if you have any good links etc tough that would be useful. Thanks – sji Jul 11 '16 at 11:20
1

You have a stack allocator ready made by Mr Howard Hinnant himself: https://howardhinnant.github.io/short_alloc.h – Arunmu Jul 11 '16 at 11:22
You have to pass it by reference of course or create a singleton class of it. – Arunmu Jul 11 '16 at 11:22
Thanks, thats great. Would have accepted this if not for the placement new answer which we have decided to use. – sji Jul 11 '16 at 11:24

Peter · Answer 5 · 2016-07-28T10:47:00.230

One way is to work out, in advance through analysis, how many of each type of object is needed by your program.

Then you can allocate arrays of an appropriate size in advance, as long as you have book-keeping to track the allocation.

For example;

#include <array>

//   Ncats, Ndogs, etc are predefined constants specifying the number of cats and dogs

std::array<Cat, Ncats> cats;
std::array<Dog, Ndogs> dogs;

//  bookkeeping - track the returned number of cats and dogs

std::size_t Rcats = 0, Rdogs = 0;

Pet *pet_maker()
{
    // determine what needs to be returned

    if (return_cat)
    {
       assert(Rcats < Ncats);
       return &cats[Rcats++];
    }
    else if (return_dog)
    {
       assert(Rdogs < Ndogs);
       return &dogs[Rdogs++];
    }
    else
    {
        // handle other case somehow
    }
}

Of course, the big trade-off in is the requirement to explicitly determine the number of each type of animal in advance - and separately track each type.

However, if you wish to avoid dynamic memory allocation (operator new) then this way - as draconian as it might seem - provides an absolute guarantee. Using operator new explicitly allows the number of objects needed to be determined at run time. Conversely, to avoid using operator new but allow some function to safely access a number of objects it is necessary to predetermine the number of objects.

I think your code has an off by one error. The asserts check before the increment and cats[Ncats] would be out of bounds. — josefx, Jul 28 '16 at 10:20
Yes, you're right. I've corrected that. Also fixed another minor typo. Thanks for pointing it out. — Peter, Jul 28 '16 at 10:47

score 3 · Answer 6 · answered Jul 11 '16 at 11:13

It depends on the exact use case you have, and what restrictions you are willing to tolerate. For example, if you are OK with re-using the same objects rather than having new copies every time, you could return references to static objects inside the function:

Pet& pet_maker()
{
static Dog dog;
static Cat cat;

    //...

    if(shouldReturnDog) {
        //manipulate dog as necessary
        //...
        return dog;
    }
    else
    {
        //manipulate cat as necessary
        //...
        return cat;
    }
}

This works if the client code accepts that it doesn't own the object returned and that the same physical instances are reused.

There are other tricks possible if this particular set of assumptions is unsuitable.

This is what we are doing at the moment :-) unfortunately the problem becomes people not expecting the cat to change under their feet. I'll accept this if nothing better turns up though. — sji, Jul 11 '16 at 11:16
Wouldn't this return the same instance of dog or cat ? From one of the comment `I do not know how many might be needed unfortunately` by OP, this probably won't work if multiple pets need to exist at the same time, but that again is not clear. — Arunmu, Jul 11 '16 at 11:20

score 3 · Answer 7 · answered Jul 11 '16 at 11:25

At some point somebody is going to have to allocate the memory and initialize the objects. If doing them on demand, using the heap via new is taking too long, then why no pre-allocate a number of then in a pool. Then you can initialize each individual object on an as needed basis. The downside is that you might have a bunch of extra objects laying around for a while.

If actually initializing the object is the problem, and not memory allocation, then you can consider keeping a pre-built object around and using the Pototype pattern for quicker initialization.

For best results, memory allocation is problem and initialization time, you can combine both strategies.

score 3 · Answer 8 · answered Jul 11 '16 at 14:59

You may want to consider using a (Boost) variant. It will require an extra step by the caller, but it might suit your needs:

#include <boost/variant/variant.hpp>
#include <boost/variant/get.hpp>
#include <iostream>

using boost::variant;
using std::cout;


struct Pet {
    virtual void print_type() const = 0;
};

struct Cat : Pet {
    virtual void print_type() const { cout << "Cat\n"; }
};

struct Dog : Pet {
    virtual void print_type() const { cout << "Dog\n"; }
};


using PetVariant = variant<Cat,Dog>;
enum class PetType { cat, dog };


PetVariant make_pet(PetType type)
{
    switch (type) {
        case PetType::cat: return Cat();
        case PetType::dog: return Dog();
    }

    return {};
}

Pet& get_pet(PetVariant& pet_variant)
{
    return apply_visitor([](Pet& pet) -> Pet& { return pet; },pet_variant);
}




int main()
{
    PetVariant pet_variant_1 = make_pet(PetType::cat);
    PetVariant pet_variant_2 = make_pet(PetType::dog);
    Pet& pet1 = get_pet(pet_variant_1);
    Pet& pet2 = get_pet(pet_variant_2);
    pet1.print_type();
    pet2.print_type();
}

Output:

Cat
Dog

score 1 · Answer 9 · answered Aug 13 '16 at 16:46

For example I have some function pet_maker() that creates and returns a Cat or a Dog as a base Pet. I want to call this function many many times, and do something with the Pet returned.

If you are going to discard the pet immediately after you have done something with it, you can use the technique shown in the following example:

#include<iostream>
#include<utility>

struct Pet {
    virtual ~Pet() = default;
    virtual void foo() const = 0;
};

struct Cat: Pet {
    void foo() const override {
        std::cout << "cat" << std::endl;
    }
};

struct Dog: Pet {
    void foo() const override {
        std::cout << "dog" << std::endl;
    }
};

template<typename T, typename F>
void factory(F &&f) {
    std::forward<F>(f)(T{});
}

int main() {
    auto lambda = [](const Pet &pet) { pet.foo(); };
    factory<Cat>(lambda);
    factory<Dog>(lambda);
}

No allocation required at all. The basic idea is to revert the logic: the factory no longer returns an object. Instead it calls a function providing the right instance as a reference.
The problem with this approach arises if you want to copy and store the object somewhere.
For it is not clear from the question, it's worth to propose also this solution.

Is there a way to return an abstraction from a function without using new (for performance reasons)

9 Answers9