Is shared pointer thread-safety zero-cost?

Question

I recently found out that the control block for shared pointers (the thing that manages the reference count) is thread-safe, so things like copying and passing shared pointers are safe for multithreaded uses. However, I also know that one of the ideals of C++ is that you shouldn't have to pay for features you don't use. To me, it seems that the thread-safety of the control block would require some mutex locks, which is some overhead.

Given that it is perfectly reasonable to use shared pointers in non-multithreaded applications, I don't see why this overhead is accepted. So my question is whether the C++ language designers decided to take the bullet and accept this additional overhead for all cases, or if this thread-safe control block can be implemented in a way that it is zero-cost (unlike my naive mutex lock assumption).

Instead of mutex locks, most (all?) implementations use lock-free atomics. — Eljay, Aug 16 '22 at 20:17
@lorro not really no, I'm more curious on how/if thread-safety is achieved at zero cost, not the uses of shared_ptr — k huang, Aug 16 '22 at 20:22
@khuang Quote from the question's answers I linked: " 20.8.2.6 shared_ptr atomic access [util.smartptr.shared.atomic] Concurrent access to a shared_ptr object from multiple threads does not introduce a data race if the access is done exclusively via the functions in this section and the instance is passed as their first argument. " (C++14) — lorro, Aug 16 '22 at 20:24
To offer some clarity to this question - it would be _very rare_ for any implementation of `std::shared_ptr` to use mutexes. The control block is simple enough to be threadsafe with only the presence of memory fences. And many CPU architectures don't require any instructional overhead for certain memory fences. — Drew Dormann, Aug 16 '22 at 20:24
@DrewDormann Memory fences are not sufficient to modify the value of a control block. Atomic operations are sufficient however as they prevent race conditions. The thing is atomic operations are far from being free. They are a bigger latency. For example, a `lock add` takes 17 cycles on Zen 2 and 18 cycles on Skylake. Not to mention such instruction are not pipelined each other (ie. this is the reciprocal throughput). In comparison with basic operations, such architecture can do multiple `add` per cycle (ie. >20 time bigger). — Jérôme Richard, Aug 16 '22 at 20:42
You can implement your own single threaded shared pointer if you want. I guess that the standard doesn't include it because it would have a high risk of being misused (resulting in undefined behavior) for a relatively small optimization. But yes, I can confirm that atomic operations can be costly, especially on ARM like platforms if you use the default memory ordering of `memory_order_seq_cst`. Shared pointer should use `memory_order_relaxed` for retaining and `memory_order_release` for releasing. — prapin, Aug 16 '22 at 20:48
What @Eljay said. So not zero cost, but minimal cost, in most real world scenarios. — Paul Sanders, Aug 16 '22 at 20:50
An option is to make your own non-thread-safe shared_ptr and just change your own control block to use non-atomics. (I am a bit surprised the standard doesn't have a threading policy for shared_ptr, much like allocator for the containers.) — Eljay, Aug 16 '22 at 21:02
Probably also relevant - and this may seem glib, but that is not the intention - `std::shared_ptr` will solve some tricky lifetime issues in a multithreaded environment. But in a single-threaded speed-critical program, `std::shared_ptr` is almost always unnecessary. Code can be rearchitected to use `std::unique_ptr` instead, without the runtime lifetime management. — Drew Dormann, Aug 16 '22 at 21:51

score 3 · Accepted Answer · answered Aug 16 '22 at 21:05

No, std::shared_ptr's thread-safety is not zero-cost.

The alternative, however, would be a shared_ptr that is essentially unusable in a multithreaded environment. It would be impossible for multiple threads to safely hold multiple shared_ptrs to the same object without the possibility of creating data races on the control block.

To solve that problem, the standard library would need separate thread-safe and non-thread-safe shared_ptr types, which would significantly complicate the interfaces of anything that uses std::shared_ptr. I assume this complication was deemed to outweigh the fairly minimal overhead of simply always performing atomic updates on the control block's reference counts.

Is shared pointer thread-safety zero-cost?

1 Answers1