33

I'm currently designing a object structure for a game, and the most natural organization in my case became a tree. Being a great fan of smart pointers I use shared_ptr's exclusively. However, in this case, the children in the tree will need access to it's parent (example -- beings on map need to be able to access map data -- ergo the data of their parents.

The direction of owning is of course that a map owns it's beings, so holds shared pointers to them. To access the map data from within a being we however need a pointer to the parent -- the smart pointer way is to use a reference, ergo a weak_ptr.

However, I once read that locking a weak_ptr is a expensive operation -- maybe that's not true anymore -- but considering that the weak_ptr will be locked very often, I'm concerned that this design is doomed with poor performance.

Hence the question:

What is the performance penalty of locking a weak_ptr? How significant is it?

Kornel Kisielewicz
  • 55,802
  • 15
  • 111
  • 149
  • 1
    I don't know for sure, but I would guess that it should be roughly equivalent to the cost of copy constructing a shared_ptr. – James McNellis Apr 30 '10 at 22:47
  • @James - so I assume the locking is just a read and copy of the allocated ref counter... – Kornel Kisielewicz Apr 30 '10 at 23:48
  • @Kornel: It's an atomic increment of the reference count; how that is implemented is very platform specific (a mutex lock would be the worst case scenario; on Windows it is implemented using InterlockedIncrement, I'm sure that Linux and other OSes have similar built-in atomic operations). – James McNellis May 01 '10 at 00:17
  • @James, so we *may* have a performance penalty compared to just dereferencing a shared pointer...? – Kornel Kisielewicz May 01 '10 at 00:22
  • 3
    @Kornel: There's guaranteed to be a performance penalty. Dereferencing a shared_ptr should be as fast as dereferencing a raw pointer, since that's all it has to do internally (each shared_ptr object has its own copy of the pointer). If the solution recommended in the deleted answer works for your specific use case, that would give you much better performance (I'm surprised the answer was deleted). – James McNellis May 01 '10 at 00:36
  • @James, really thanks for the clarifications -- yeah, I also wonder why it was deleted :/ – Kornel Kisielewicz May 01 '10 at 00:43
  • 1
    Just use a raw pointer to point to the parent, that’s safe and efficient. – Konrad Rudolph Jul 04 '13 at 18:40
  • @JamesMcNellis just found this, after writing this http://stackoverflow.com/questions/20290524/c-weak-ptr-creation-performance/20290701#20290701 – Alec Teal Nov 29 '13 at 17:45

3 Answers3

20

From the Boost 1.42 source code (<boost/shared_ptr/weak_ptr.hpp> line 155):

shared_ptr<T> lock() const // never throws
{
    return shared_ptr<element_type>( *this, boost::detail::sp_nothrow_tag() );
}

ergo, James McNellis's comment is correct; it's the cost of copy-constructing a shared_ptr.

Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • So we've got a reference incrementation operation only? Actually not surprising... yet there was something that was expensive in case of weak_ptr's -- any idea what that was? I assume that the opposite (construction of a weak_ptr from shared_ptr) should also be trivial... – Kornel Kisielewicz Apr 30 '10 at 23:33
  • @Kornel Kisielewicz: Previous comment deleted -- I thought it said "reference **implementation** only" at first LOL! My guess on the efficiency argument against `weak_ptr` is a comparison to builtin pointers rather than shared_ptrs (you can have a builtin pointer pointing to the same place as a shared_ptr, after all :) ) – Billy ONeal May 01 '10 at 00:00
  • yes, that might also be it. My second reaction after posting this question was that it might have something to do with threading, but it seems that weak and shared use the same reference counting structure, so there shouldn't be a difference. – Kornel Kisielewicz May 01 '10 at 00:14
  • @James -- so there is a performance drop -- because having a shared pointer, we would just dereference it, not copy. In case of weak_ptr we need that copy and ref increment to use it, and then decrement after it goes out of scope. So much for the 42 nanoseconds xP – Kornel Kisielewicz May 01 '10 at 00:23
  • 1
    It seems that this answer is wrong and misleading: all the code that could be perf related is actually in shared_ptr ctor that tries to lock weak_ptr. In other words, it's not a copy or copy constructing a shared_ptr. – Pavel P May 07 '20 at 08:55
12

For my own project, I was able to improve performance dramatically by adding #define BOOST_DISABLE_THREADS before any boost includes. This avoids the spinlock/mutex overhead of weak_ptr::lock which in my project was a major bottleneck. As the project is not multithreaded wrt boost, I could do this.

Jeff Linahan
  • 3,775
  • 5
  • 37
  • 56
10

Using/dereferencing a shared_ptr is almost like accessing raw ptr, locking a weak_ptr is a perf "heavy" operation compared to regular pointer access, because this code has to be "thread-aware" to work correctly in case if another thread triggers release of the object referenced by the pointer. At minimum, it has to perform some sort of interlocked/atomic operation that by definition is much slower than regular memory access.

As usual, one way to see what's going on is to inspect generated code:

#include <memory>

class Test
{
public:
    void test();
};

void callFuncShared(std::shared_ptr<Test>& ptr)
{
    if (ptr)
        ptr->test();
}

void callFuncWeak(std::weak_ptr<Test>& ptr)
{
    if (auto p = ptr.lock())
        p->test();
}

void callFuncRaw(Test* ptr)
{
    if (ptr)
        ptr->test();
}

Accessing through shared_ptr and raw pointer is the same. Since shared_ptr was passed as a reference, we need to load referenced value, that's why the difference is only one extra load for shared_ptr version.

callFuncShared:

enter image description here

callFuncWeak:

enter image description here

Calling through weak_ptr produces 10x more code and at best it has to go through locked compare-exchange, which by itself will take more than 10x CPU time than dereferencing raw or shared_ptr:

enter image description here

Only if the shared counter isn't zero, only then it can load the pointer to actual object and use it (by calling the object, or creating a shared_ptr).

Pavel P
  • 15,789
  • 11
  • 79
  • 128