Runtime overhead of shared_ptr for subclass destruction compared to virtual destructor

Question

I came across the shared_ptr trick for mimicking the behaviour of virtual destructors on a Youtube video (https://www.youtube.com/watch?v=ZiNGWHg5Z-o&list=PLE28375D4AC946CC3&index=6), and upon searching the internet came across this SO answer: shared_ptr magic :)

Normally, if B inherits from A and has its own destructor, we need a virtual destructor in the base class A to ensure that B's destructor is correctly called. However, using shared_ptr you can circumvent the need for a virtual destructor.

Since there is runtime overhead of vtable lookups in polymorphic functions, I became curious to know if the shared_ptr trick can avoid this overhead.

Do note that `std::shared_ptr` does have atomic reference counting overhead. That may or may not outweigh virtual mechanics. — NathanOliver, Apr 25 '17 at 11:45

score 3 · Accepted Answer · answered Apr 25 '17 at 12:39

Firstly, the "trick" is possible with unique_ptr. You simply have to provide a type-erasing deleter, as below.

However if you look at the implementation you will of course see that it involves a call through a function pointer - which is exactly what a virtual destructor does under the covers.

virtual function calls are not at all expensive. They simply involve one more memory fetch. If you're doing this in a tight loop, which is the only time performance is ever an issue, that fetch will almost certainly be cached.

In addition, if the compiler can prove that it knows the correct destructor, it will elide the polymorphic lookup entirely (with optimisations turned on, of course).

To cut a long story short, if this is your only performance concern then you have no performance concerns. If you have performance concerns and you think it's because of virtual destructors, then with respect, you are certainly mistaken.

Example code:

#include <iostream>
#include <memory>

struct A {
    ~A() { std::cout << "~A\n"; }
};

struct B : A {
    ~B() { std::cout << "~B\n"; }
};


struct poly_deleter {
    template<class T>
    struct tag {
    };

    template<class T>
    static void delete_it(void *p) { delete reinterpret_cast<T *>(p); }

    template<class T>
    poly_deleter(tag<T>) : deleter_(&delete_it<T>) {}

    void operator()(void *p) const { deleter_(p); }

    void (*deleter_)(void *) = nullptr;
};

template<class T> using unique_poly_ptr = std::unique_ptr<T, poly_deleter>;

template<class T, class...Args> auto make_unique_poly(Args&&...args) -> unique_poly_ptr<T>
{
    return unique_poly_ptr<T> {
            new T (std::forward<Args>(args)...),
            poly_deleter(poly_deleter::tag<T>())
    };
};

int main()
{

    auto pb = make_unique_poly<B>();

    auto pa = std::move(pb);

    pa.reset();
}

expected output:

~B
~A

I get it now, thank you. The performance overhead concern I had was branch prediction miss due to vtable lookup. I wasn't concerned with the vtable lookup overhead (which will probably be cached anyway) as much as with how branch prediction might be affected by polymorphism. But as you say, compiler optimizations should, in a normal use-case, be able to elide the lookup entirely, which would solve the branch prediction problem too. — Anton, Apr 25 '17 at 13:54
@Anton Before getting too concerned about potential performance problems you should measure if you actually have any. — Cubic, Apr 25 '17 at 14:38
@Anton indirect lookups won't cause a branch prediction miss. There are no conditional branches involved. Have no fear :) — Richard Hodges, Apr 25 '17 at 15:01

Yakk - Adam Nevraumont · Answer 2 · 2017-04-25T12:49:34.250

Shares ptr type erases the destructor; various kinds of type erasure tend all have "similar" costs; give or take a factor of two. Some save cache miss or two over others. One kind of type erasure is virtual function tables.

How exactly the shared ptr erases destruction is left to the implementation. But type erasure is never completely free compared to an inlined function call.

It will be difficult to prove which is more efficient, as cache exhastion is likely to matger more than any microbenchmark you try. It is unlikely to be a huge source of slowdown in either case.

Runtime overhead of shared_ptr for subclass destruction compared to virtual destructor

2 Answers2