3

AFAIK, the use of shared_ptr is often discouraged because of potential bugs caused by careless usage of them (unless you have a really good explanation for significant benefit and carefully checked design).

On the other hand, Python objects seem to be essentially shared_ptrs (ref_count and garbage collection).

I am wondering what makes them work nicely in Python but potentially dangerous in C++. In other words, what are the differences between Python and C++ in dealing with shared_ptr that makes their usage discouraged in C++ but not causing similar problems in Python?

I know e.g. Python automatically detects cycles between objects which prevents memory leaks that dangling cyclic shared_ptrs can cause in C++.

Kaveh
  • 466
  • 1
  • 7
  • 21

2 Answers2

9

"I know e.g. Python automatically detects cycles" -- that's what makes them work nicely, at least so far as the "potential bugs" relate to memory leaks.

Besides which, C++ programs are more commonly written under tight performance constraints than Python programs (which IMO is a combination of different genuine requirements with some fairly bogus differences in rules-of-thumb, but that's another story). A fairly high proportion of the Python objects I use don't strictly need reference counting, they have exactly one owner and a unique_ptr would be fine (or for that matter a data member of class type). In C++ it's considered (by the people writing the advice you're reading) worth taking the performance advantage and the explicitly simplified design. In Python it's usually not considered a problem, you pay the performance and you keep the flexibility to decide later that it's shared after all without any code change required (other than to take additional references that outlive the original, I mean).

Btw in any language, shared mutable objects have "potential bugs" associated with them, if you lose track of what objects will or won't change when you're not looking at them. I don't just mean race conditions: even in a single-threaded program you need to be aware that C++ Predicates shouldn't change anything and that you (often) can't mutate a container while iterating over it. I don't see this as a difference between C++ and Python, though. Rather, to some extent you should be slightly wary of shared objects in Python too, and when you proliferate references to an object at least understand why you're doing it.

So, on to the list of issues in the question you link to:

Most of this is to do with the fact that in C++ you have to explicitly do something to get refcounting, and you don't get it if you don't ask for it. This provides several opportunities for error that Python doesn't make available to the programmer because it just does it for you. If you use shared_ptr correctly then apart from the existence of libraries that don't co-operate with it, none of these problems comes up in C++ either. Those who are cautious of using it for these reasons are basically saying they're afraid they'll use it incorrectly, or at any rate more afraid than that they'll misuse some alternative. Much of C++ programming is trading different potential bugs off against each other until you come up with a design that you consider yourself competent to execute. Furthermore it has "don't pay for what you don't need" as a design philosophy. Between these two factors, you don't do anything without a really good explanation, a significant benefit, and a carefully checked design. shared_ptr is no different ;-)

Community
  • 1
  • 1
Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • I understand it is probably the most common bug, but if I am not mistaken cyclic references is not the only common bug caused by shared_ptr. So e.g. if we _hypothetically_ added a periodic shared_ptr cycle detection would that resolve the dangers of shared_ptr that do not happen with Python objects? Or is there more to why they work well in Python. My impression from the question about dangers of shared_ptr and Google's C++ style guide was that it is not the only one. – Kaveh Sep 02 '15 at 21:03
  • @Kaveh: it would certainly help, there are garbage-collecting smart pointer libraries out there and of course the standard now allows for the C++ implementation to do garbage collection too. But until someone specifies what kind of potential bugs they're talking about, it's not really possible to assess how the same issues are addressed in Python. Note that Google's C++ style guide is *astonishingly* cautious about using C++ features, and doesn't reflect typical use of C++. – Steve Jessop Sep 02 '15 at 21:07
  • E.g. the issues mentioned in http://stackoverflow.com/q/701456/ (Yes, I know the style guide is kind of tailored towards Google's existing code base. :) – Kaveh Sep 02 '15 at 21:25
  • Excellent answer. Essentially, what my experience is that over time, you'd do nearly all of the mistakes above and correct your design to avoid that. For example: making constructors private, and enforcing a friend `make` method that only gives out `shared_ptr`. No chance to get a non-refcounted object, and unless you explicitely create a shared pointer for an object you got via `get()` (which would be an obvious mistake), you can't have to `shared_ptr` to the same object. Obviously, you'd only do that for large object that need to be handled around. – Marcus Müller Sep 02 '15 at 22:23
  • @Marcus: personally, there are very few types that I'd force to be heap-allocated, and even so I'd have the factory return `unique_ptr` and let the user "promote" that to `shared_ptr` only if the object actually is shared. But that's details to support more use-cases, I agree that with experience you code defensively against the errors that you know about. If the type has a public constructor and some numpty abuses `new` instead of using `make_shared` or `make_unique` or some other "safe" construction idiom then arguably that's their fault anyway and should be a reviewer-magnet :-) – Steve Jessop Sep 02 '15 at 22:40
  • yes, but with a community as diverse like that of GNU Radio, you don't get a review process, @SteveJessop, you just get a whole ecosystem of out-of-tree modules, that (hopefully) adhere to the coding guidelines the core devs set (primarily by example). Hence, making it hard to make mistakes in non-performance critical parts was crucial to the expansion of the community, IMHO, whereas sticking to raw memory buffers where performance matters made the Software the right choice when shuffling data around that saturates 10 gigabit ethernet lines in hard real time. – Marcus Müller Sep 02 '15 at 23:26
1

AFAIK, the use of shared_ptr is often discouraged because of potential bugs caused by careless usage of them (unless you have a really good explanation for significant benefit and carefully checked design).

I wouldn't agree. The tendency goes towards generally using these smart pointers unless you have a very good reasons not to do so.

shared_ptr that makes their usage discouraged in C++ but not causing similar problems in Python?

Well, I don't know about your favourite largish signal processing framework ecosystem, but GNU Radio uses shared_ptrs for all their blocks, which are the core elements of the GNU Radio architecture. In fact, blocks are classes, with private constructors, which are only accessible by a friend make function, which returns a shared_ptr. We haven't had problems with this -- and GNU Radio had good reason to adopt such a model. Now, we don't have a single place where users try to use deallocated block objects, not a single block is leaked. Nice!

Also, we use SWIG and a gateway class for a few C++ types that can't just be represented well as Python types. All this works very well on both sides, C++ and Python. In fact, it works so very well, that we can use Python classes as blocks in the C++ runtime, wrapped in shared_ptr.

Also, we never had performance problems. GNU Radio is a high rate, highly optimized, heavily multithreaded framework.

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94
  • Sorry, but I am not interested in arguing whether shared_ptr are discouraged or not, it is a presumption for my question. You can disagree with it but your answer doesn't really tell me what I want to know. You are rather argueing that the question is wrong and you have a different experience with them. That is not very helpful. – Kaveh Sep 02 '15 at 21:15
  • `shared_ptr` is pretty much the last thing you would use though. So it is kind of discouraged. Unless you need shared ownership, of course. But C++ doesn't force that upon you. – juanchopanza Sep 02 '15 at 21:15
  • @juanchopanza I don't understand: why that? My experience looks like this: `cd gnuradio; git grep shared_ptr|wc` yields 630 occurrences. – Marcus Müller Sep 02 '15 at 21:19
  • @juanchopanza: I know that anecdotal evidence isn't an argument; but neither is "this is the last thing you'd use". – Marcus Müller Sep 02 '15 at 21:20
  • 2
    @MarcusMüller But maybe they really are needed in gnuradio. My badly made point is that you have to think of ownership in C++, and shared ownership is usually not the best solution. For people used to python, Java or other languages, that might be hard to grasp, so they might have a bias towards over-using shared_ptr. – juanchopanza Sep 02 '15 at 21:22
  • @juanchopanza, would you expand your point to an answer? That might be what I am looking for, i.e. we don't need to think about ownership in Python but we need to do so in C++. – Kaveh Sep 02 '15 at 21:31
  • 1
    @juanchopanza ah! ok, yes, I agree. The problem with ownership in GNU Radio is that it's a very directed-graph oriented application, with lots of buffers that have blocks as in- and outputs; creation of the block however happens from the outside, and it's often not clear whether a user application wants to hold on to a block, or whether the instance can be deconstructed after execution; shared_ptrs solve this by ref counting and scoping. – Marcus Müller Sep 02 '15 at 21:33
  • @Kaveh We don't need to think about ownership in Python *only because* it is already taken care of by reference counting. – Ilya Popov Sep 02 '15 at 21:34
  • 1
    @Kaveh As Steve Jessop alluded to in his answer, in python most things are shared. So you don't have to think of it in the sense that you don't have a choice. But you *do* have to think of it in the sense that you have to be aware that things you do to an object can affect code a million miles away. – juanchopanza Sep 02 '15 at 21:34
  • 1
    @Marcus: out of interest, does gnuradio have some general trick to prevent uncollected reference loops? Is it `weak_ptr` all over, or is there some useful property of the design that ensures your graphs are trees/forests? – Steve Jessop Sep 02 '15 at 22:00
  • 1
    Ah well, our graphs must be cycle free, but not for refcounting purposes, but for processing/causality aspects. The block objects themselves usually don't hold references to each other, so there's seldom any chance to build a cycle. In fact, these object communicate only via circular buffers (or what we mmap to fake circ buffers) or via message passing using thread-safe queues (usually, each of the block "runs" in its own thread). It's job of the scheduler to broker the data in and out. @SteveJessop – Marcus Müller Sep 02 '15 at 22:17