114

This is a bit of a two part question, all about the atomicity of std::shared_ptr:

1. As far as I can tell, std::shared_ptr is the only smart pointer in <memory> that's atomic. I'm wondering if there is a non-atomic version of std::shared_ptr available (I can't see anything in <memory>, so I'm also open to suggestions outside of the standard, like those in Boost). I know boost::shared_ptr is also atomic (if BOOST_SP_DISABLE_THREADS isn't defined), but maybe there's another alternative? I'm looking for something that has the same semantics as std::shared_ptr, but without the atomicity.

2. I understand why std::shared_ptr is atomic; it's kinda nice. However, it's not nice for every situation, and C++ has historically had the mantra of "only pay for what you use." If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why) (if the answer is simply "a non-atomic version was simply never considered" or "no one ever asked for a non-atomic version" that's fine!).

With question #2, I'm wondering if someone ever proposed a non-atomic version of shared_ptr (either to Boost or the standards committee) (not to replace the atomic version of shared_ptr, but to coexist with it) and it was shot down for a specific reason.

Cornstalks
  • 37,137
  • 18
  • 79
  • 144
  • 4
    What "cost" exactly are you concerned about here? The cost of atomically incrementing an integer? Is that actually a cost that concerns you for any real application? Or are you just prematurely optimizing? – Nicol Bolas Feb 28 '13 at 06:52
  • 11
    @NicolBolas: It's more curiosity than anything else; I don't (currently) have any code/project where I'm seriously wanting to use a non-atomic shared pointer. However, I have had projects (in the past) where Boost's `shared_ptr` was a significant slowdown due to its atomicity, and defining `BOOST_DISABLE_THREADS` made a noticeable difference (I don't know if `std::shared_ptr` would have had the same cost that that `boost::shared_ptr` had). – Cornstalks Feb 28 '13 at 06:54
  • 16
    @Close voters: what part of the question isn't constructive? If there isn't a specific *why* for the second question, that's fine (a simple "it simply wasn't considered" would be a valid enough answer). I'm curious *if there is* a specific reason/rationale that exists. And the first question certainly is a valid question, I'd say. If I need to clarify the question, or make slight adjustments to it, please let me know. But I don't see how it's not constructive. – Cornstalks Feb 28 '13 at 07:23
  • 2
    What exactly do you mean by _atomic_? There seems to be some inconsistency in your question. "As far as I can tell, `std::shared_ptr` is the only smart pointer in `` that's atomic." vs "why wasn't an atomic version of `std::shared_ptr` provided in C++11?". – CB Bailey Feb 28 '13 at 07:43
  • @CharlesBailey: Gah, it was a typo. I meant "why wasn't a non-atomic version..." Thanks, I fixed it! – Cornstalks Feb 28 '13 at 07:47
  • 11
    @Cornstalks Well, it's probably just that people don't react that well on questions they can easily dismiss as *"premature optimization"*, no matter how valid, well-posed or relevant the question is, I guess. I for myself don't see any reason to close this as non-constructive. – Christian Rau Feb 28 '13 at 09:02
  • 13
    (can't write an answer now it's closed, so commenting) With GCC when your program doesn't use multiple threads `shared_ptr` doesn't use atomic ops for the refcount. See (2) at http://gcc.gnu.org/ml/libstdc++/2007-10/msg00180.html for a patch to GCC to allow the non-atomic implementation to be used even in multithreaded apps, for `shared_ptr` objects that aren't shared between threads. I've been sitting on that patch for years but I'm considering finally committing it for GCC 4.9 – Jonathan Wakely Feb 28 '13 at 13:55
  • 1
    @JonathanWakely: the question is open again, so if you want to add that comment as an answer (and perhaps add any further insights) I would certainly up-vote it :) – Cornstalks Feb 28 '13 at 15:42
  • Agree with OP. We use shared_ptr's all over the place (i.e. as function argument; in for loops and return value) and the atomic ref count makes them non trivial to copy. Mostly they are located in the (single threaded) GUI part; only a small portion across threads. Use with (ugly) const reference partially circumvents the problem of copy. – gast128 Nov 16 '21 at 14:13

5 Answers5

125

1. I'm wondering if there is a non-atomic version of std::shared_ptr available

Not provided by the standard. There may well be one provided by a "3rd party" library. Indeed, prior to C++11, and prior to Boost, it seemed like everyone wrote their own reference counted smart pointer (including myself).

2. My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11?

This question was discussed at the Rapperswil meeting in 2010. The subject was introduced by a National Body Comment #20 by Switzerland. There were strong arguments on both sides of the debate, including those you provide in your question. However, at the end of the discussion, the vote was overwhelmingly (but not unanimous) against adding an unsynchronized (non-atomic) version of shared_ptr.

Arguments against included:

  • Code written with the unsynchronized shared_ptr may end up being used in threaded code down the road, ending up causing difficult to debug problems with no warning.

  • Having one "universal" shared_ptr that is the "one way" to traffic in reference counting has benefits: From the original proposal:

    Has the same object type regardless of features used, greatly facilitating interoperability between libraries, including third-party libraries.

  • The cost of the atomics, while not zero, is not overwhelming. The cost is mitigated by the use of move construction and move assignment which do not need to use atomic operations. Such operations are commonly used in vector<shared_ptr<T>> erase and insert.

  • Nothing prohibits people from writing their own non-atomic reference-counted smart pointer if that's really what they want to do.

The final word from the LWG in Rapperswil that day was:

Reject CH 20. No consensus to make a change at this time.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Howard Hinnant
  • 206,506
  • 52
  • 449
  • 577
  • 8
    Wow, perfect, thanks for the information! That's exactly the kind of information I was hoping to find. – Cornstalks Feb 28 '13 at 16:46
  • 1
    > `Has the same object type regardless of features used, greatly facilitating interoperability between libraries, including third-party libraries.` that's an extremely weird reasoning. Third party libraries will provide their own types anyways, so why would it matter if they provided it under the form of std::shared_ptr, std::non_atomic_shared_ptr, etc ? you will always have to adapt your code to what the library returns anyways – Jean-Michaël Celerier Jun 21 '18 at 13:13
  • 1
    That's true as far as library-specific types go, but the idea is that there are also lots of places where standard types show up in third-party APIs. For example, my library might take a `std::shared_ptr` somewhere. If someone else's library takes that type too, callers can pass the same strings to both of us without the inconvenience or overhead of converting between different representations, and that's a small win for everyone. – Jack O'Connor May 21 '19 at 16:32
  • @Jean-MichaëlCelerier Also, imagine a library using `std::unsynchronized_shared_ptr` in its interface. You want to use the library, but your application happens to be multithreaded. You now have a pointer that you must protect with a mutex every time you pass it around. – Lukas Barth Nov 16 '21 at 14:00
  • @LukasBarth the alternative to `std::unsynchronized_shared_ptr` in the interface if the latter is not available is *not* `std::shared_ptr`, it's *let's use our own custom shared pointer type which does exactly what we want even if it needs explicit synchronisation* instead. Making things strictly worse for the end-user. – Jean-Michaël Celerier Nov 18 '21 at 12:08
  • @LukasBarth Isn't that how it is supposed to be? For eg, all the containers in STL are unsynchronized, and you have to protect it with a mutex when you are using it in multiple threads. – Sourav Kannantha B Jul 07 '22 at 06:40
60

Howard's answered the question well already, and Nicol made some good points about the benefits of having a single standard shared pointer type, rather than lots of incompatible ones.

While I completely agree with the committee's decision, I do think there is some benefit to using an unsynchronized shared_ptr-like type in special cases, so I've investigated the topic a few times.

If I'm not using multiple threads, or if I am using multiple threads but am not sharing pointer ownership across threads, an atomic smart pointer is overkill.

With GCC when your program doesn't use multiple threads shared_ptr doesn't use atomic ops for the refcount. This is done by updating the reference counts via wrapper functions that detect whether the program is multithreaded (on GNU/Linux this is done by checking a special variable in Glibc that says if the program is single-threaded[1]) and dispatch to atomic or non-atomic operations accordingly.

I realised many years ago that because GCC's shared_ptr<T> is implemented in terms of a __shared_ptr<T, _LockPolicy> base class, it's possible to use the base class with the single-threaded locking policy even in multithreaded code, by explicitly using __shared_ptr<T, __gnu_cxx::_S_single>. You can use an alias template like this to define a shared pointer type that is not thread-safe, but is slightly faster[2]:

template<typename T>
  using shared_ptr_unsynchronized = std::__shared_ptr<T, __gnu_cxx::_S_single>;

This type would not be interoperable with std::shared_ptr<T> and would only be safe to use when it is guaranteed that the shared_ptr_unsynchronized objects would never be shared between threads without additional user-provided synchronization.

This is of course completely non-portable, but sometimes that's OK. With the right preprocessor hacks your code would still work fine with other implementations if shared_ptr_unsynchronized<T> is an alias for shared_ptr<T>, it would just be a little faster with GCC.


[1] Before Glibc 2.33 added that variable, the wrapper functions would detect whether the program links to libpthread.so as an imperfect method of checking for single-threaded vs multi-threaded.

[2] Unfortunately because that wasn't an intended use case it didn't quite work optimally before GCC 4.9, and some operations still used the wrapper functions and so dispatched to atomic operations even though you've explicitly requested the `_S_single` policy. See point (2) at http://gcc.gnu.org/ml/libstdc++/2007-10/msg00180.html for more details and a patch to GCC to allow the non-atomic implementation to be used even in multithreaded apps. I sat on that patch for years but I finally committed it for GCC 4.9.
Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
  • 2
    Just wondering, is there a typo in your example of the template alias? I.e. I think it should read shared_ptr_unsynchronized = std::__shared_ptr<. Incidentally, I tested this today, in conjunction with std::__enable_shared_from_this and std::__weak_ptr, and it seems to work nicely (gcc 4.9 and gcc 5.2). I will profile/disassemble it shortly to see if indeed the atomic operations are skipped. – Carl Cook Jan 11 '16 at 21:16
  • Awesome details! Recently I faced an issue, as described in [this question](https://stackoverflow.com/questions/47202468/segfault-on-declaring-a-variable-of-type-vectorshared-ptrint), that eventually made me to look into the source code of `std::shared_ptr`, `std::__shared_ptr`, `__default_lock_policy` and such. This answer confirmed what I understood from the code. – Nawaz Nov 16 '17 at 09:06
20

My second question is why wasn't a non-atomic version of std::shared_ptr provided in C++11? (assuming there is a why).

One could just as easily ask why there isn't an intrusive pointer, or any number of other possible variations of shared pointers one could have.

The design of shared_ptr, handed down from Boost, has been to create a minimum standard lingua-franca of smart pointers. That, generally speaking, you can just pull this down off the wall and use it. It's something that would be used generally, across a wide variety of applications. You can put it in an interface, and odds are good people will be willing to use it.

Threading is only going to get more prevalent in the future. Indeed, as time passes, threading will generally be one of the primary means to achieve performance. Requiring the basic smart pointer to do the bare minimum needed to support threading facilitates this reality.

Dumping a half-dozen smart pointers with minor variations between them into the standard, or even worse a policy-based smart pointer, would have been terrible. Everyone would pick the pointer they like best and forswear all others. Nobody would be able to communicate with anyone else. It'd be like the current situations with C++ strings, where everyone has their own type. Only far worse, because interoperation with strings is a lot easier than interoperation between smart pointer classes.

Boost, and by extension the committee, picked a specific smart pointer to use. It provided a good balance of features and was widely and commonly used in practice.

std::vector has some inefficiencies compared to naked arrays in some corner cases too. It has some limitations; some uses really want to have a hard limit on the size of a vector, without using a throwing allocator. However, the committee didn't design vector to be everything for everyone. It was designed to be a good default for most applications. Those for whom it can't work can just write an alternative that suites their needs.

Just as you can for a smart pointer if shared_ptr's atomicity is a burden. Then again, one might also consider not copying them around so much.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 7
    +1 for "one might also consider not copying them around so much." – Ali Feb 28 '13 at 09:47
  • If you ever hook up a profiler, you're special and you can just tune out arguments like the above. If you don't have an operational requirement that is hard to meet you shouldn't use C++. Arguing like you do is a good way to make C++ universally reviled by anyone interested in high performance or low latency.This is why game programmers don't use STL, boost or even exceptions. – Hans Malherbe Mar 02 '17 at 05:18
  • For clarity, I think the quote at the top of your answer should read "why wasn't *a non-atomic* version of std::shared_ptr provided in C++11?" – Charles Savoie Jul 30 '20 at 16:00
7

Boost provides a shared_ptr that's non-atomic. It's called local_shared_ptr, and can be found in the smart pointers library of boost.

The Quantum Physicist
  • 24,987
  • 19
  • 103
  • 189
  • +1 for a short solid reply with good citation, but this pointer type looks costly - in terms of both memory and runtime, due to one extra level of indirection( local->shared->ptr vs shared->ptr). – Red.Wave Jan 02 '19 at 15:27
  • @Red.Wave Can you explain what you mean with indirection and how it affects performance? Do you mean that it's a `shared_ptr` with a counter anyway, even though it's local? Or do you mean there's another problem with it? The docs say that the *only* difference is that this is not atomic. – The Quantum Physicist Jan 02 '19 at 16:20
  • Every local ptr keeps a count and refrence to the original shared ptr. Thus any access to to the final pointee needs a derefrence from local to shared ptr, which is then derefrence to the pointee. Thus there is one more indirection stacked up to the indirections from shared ptr. And that increases overhead. – Red.Wave Jan 02 '19 at 16:44
  • @Red.Wave Where are you getting this information from? This: "Thus any access to to the final pointee needs a derefrence from local to shared ptr" needs some citation. I couldn't find that in boost docs. Again, what I saw in the docs is that it says that `local_shared_ptr` and `shared_ptr` are identical except for atomic. I'm genuinely interested in finding out whether what you're saying is true because I use `local_shared_ptr` in applications that require high performance. – The Quantum Physicist Jan 02 '19 at 16:50
  • That is what the doc says. In terms of reference counting there must be an advantage, but in terms of memory it is obviously less efficient and access to pointee seems to have time penalty(twice indirection/ derefrence), or the memory overhead is even more. – Red.Wave Jan 02 '19 at 17:52
  • @Red.Wave Can you please quote the part of the docs that says that? I can't find it. – The Quantum Physicist Jan 02 '19 at 19:41
  • "One can think of local_shared_ptr as shared_ptr>, with the outer shared_ptr using non-atomic operations for its count. Converting from local_shared_ptr to shared_ptr gives you a copy of the inner shared_ptr; converting from shared_ptr wraps it into an outer shared_ptr with a non-atomic use count (conceptually speaking) and returns the result." – Red.Wave Jan 02 '19 at 19:45
  • Paragraph 2 after example code 11. That just means two levels of indirection/derefrencing. – Red.Wave Jan 02 '19 at 19:46
  • @Red.Wave Got it. Thanks. – The Quantum Physicist Jan 02 '19 at 19:52
  • you are wellcome. – Red.Wave Jan 02 '19 at 19:55
  • 3
    @Red.Wave If you look at the actual source code https://github.com/boostorg/smart_ptr/blob/aa1341a6a27bd5ceeca1ace990b9d2c76eb49247/include/boost/smart_ptr/local_shared_ptr.hpp#L120 you'll see there is no double indirection. This paragraph in the documentation is just a mental model. – Ilya Popov Feb 09 '19 at 00:32
5

I am preparing a talk on shared_ptr at work. I have been using a modified boost shared_ptr with avoid separate malloc (like what make_shared can do) and a template param for lock policy like shared_ptr_unsynchronized mentioned above. I am using the program from

http://flyingfrogblog.blogspot.hk/2011/01/boosts-sharedptr-up-to-10-slower-than.html

as a test, after cleaning up the unnecessary shared_ptr copies. The program uses the main thread only and the test argument is shown. The test env is a notebook running linuxmint 14. Here is the time taken in seconds:

test run setup    boost(1.49)      std with make_shared     modified boost
mt-unsafe(11)         11.9         9/11.5(-pthread on)          8.4  
atomic(11)            13.6            12.4                     13.0  
mt-unsafe(12)        113.5         85.8/108.9(-pthread on)     81.5  
atomic(12)           126.0           109.1                    123.6  

Only the 'std' version uses -std=cxx11, and the -pthread likely switches lock_policy in g++ __shared_ptr class.

From these numbers, I see the impact of atomic instructions on code optimization. The test case does not use any C++ containers, but vector<shared_ptr<some_small_POD>> is likely to suffer if the object doesn't need the thread protection. Boost suffers less probably because the additional malloc is limiting the amount of inlining and code optimizaton.

I have yet to find a machine with enough cores to stress test the scalability of atomic instructions, but using std::shared_ptr only when necessary is probably better.

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
russ
  • 51
  • 1
  • 1