14

C++11 introduced reference-counted smart pointers, std::shared_ptr. Being reference counted, these pointers are unable to automatically reclaim cyclic data structures. However, automatic collection of reference cycles was shown to be possible, for example by Python and PHP. To distinguish this technique from garbage collection, the rest of the question will refer to it as cycle breaking.

Given that there seem to be no proposals to add equivalent functionality to C++, is there a fundamental reason why a cycle breaker similar to the ones already deployed in other languages wouldn't work for std::shared_ptr?

Note that this question doesn't boil down to "why isn't there a GC for C++", which has been asked before. A C++ GC normally refers to a system that automatically manages all dynamically allocated objects, typically implemented using some form of Boehm's conservative collector. It has been pointed out that such a collector is not a good match for RAII. Since a garbage collector primarily manages memory, and might not even be called until there is a memory shortage, and C++ destructors manage other resources, relying on the GC to run destructors would introduce non-determinism at best and resource starvation at worst. It has also bee pointed out that a full-blown GC is largely unnecessary in the presence of the more explicit and predictable smart pointers.

However, a library-based cycle breaker for smart pointers (analogous to the one used by reference-counted interpreters) would have important differences from a general-purpose GC:

  • It only cares about objects managed through shared_ptr. Such objects already participate in shared ownership, and thus have to handle delayed destructor invocation, whose exact timing depends on ownership structure.
  • Due to its limited scope, a cycle breaker is unconcerned with patterns that break or slow down Boehm GC, such as pointer masking or huge opaque heap blocks that contain an occasional pointer.
  • It can be opt-in, like std::enable_shared_from_this. Objects that don't use it don't have to pay for the additional space in the control block to hold the cycle breaker metadata.
  • A cycle breaker doesn't require a comprehensive list of "root" objects, which is hard to obtain in C++. Unlike a mark-sweep GC which finds all live objects and discards the rest, a cycle breaker only traverses objects that can form cycles. In existing implementations, the type needs to provide help in the form of a function that enumerates references (direct or indirect) to other objects that can participate in a cycle.
  • It relies on regular "destroy when reference count drops to zero" semantics to destroy cyclic garbage. Once a cycle is identified, the objects that participate in it are requested to clear their strongly-held references, for example by calling reset(). This is enough to break the cycle and would automatically destroy the objects. Asking the objects to provide and clear its strongly-held references (on request) makes sure that the cycle breaker does not break encapsulation.

Lack of proposals for automatic cycle breaking indicates that the idea was rejected for practical or philosophical reasons. I am curious as what the reasons are. For completeness, here are some possible objections:

  • "It would introduce non-deterministic destruction of cyclic shared_ptr objects." If the programmer were in control of the cycle breaker's invocation, it would not be non-deterministic. Also, once invoked, the cycle breaker's behavior would be predictable - it would destroy all currently known cycles. This is akin to how shared_ptr destructor destroys the underlying object once its reference count drops to zero, despite the possibility of this causing a "non-deterministic" cascade of further destructions.

  • "A cycle breaker, just like any other form of garbage collection, would introduce pauses in program execution." Experience with runtimes that implement this feature shows that the pauses are minimal because the GC only handles cyclic garbage, and all other objects are reclaimed by reference counting. If the cycle detector is never invoked automatically, the cycle breaker's "pause" could be a predictable consequence of running it, similar to how destroying a large std::vector might run a large number of destructors. (In Python, the cyclic gc is run automatically, but there is API to disable it temporarily in code sections where it is not needed. Re-enabling the GC later will pick up all cyclic garbage created in the meantime.)

  • "A cycle breaker is unnecessary because cycles are not that frequent and they can be easily avoided using std::weak_ptr." Cycles in fact turn up easily in many simple data structures - e.g. a tree where children have a back-pointer to the parent, or a doubly-linked list. In some cases, cycles between heterogenous objects in complex systems are formed only occasionally with certain patterns of data and are hard to predict and avoid. In some cases it is far from obvious which pointer to replace with the weak variant.

Community
  • 1
  • 1
user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • How do you represent the *graph* of `shared_ptr` ? – Jarod42 Dec 19 '15 at 23:59
  • garbage collection is allowed for in the standard library but not specified. This is because it is the view of the committee (and every right-thinking developer) that garbage collection is un-necessary and evil. If you think you need GC, it's because you have a design problem. – Richard Hodges Dec 20 '15 at 10:21
  • @RichardHodges It is my understanding that the gc allowed by the standard is a general facility for managing heap-allocated objects, with well-understood drawbacks. This question is about a subset of gc functionality that only enables automatic detection of shared_ptr cycles. – user4815162342 Dec 20 '15 at 19:55
  • @user4815162342 understood. Nevertheless I have yet to see a convincing argument for the use of gc, other than deliberately introducing nondeterministic behaviour into otherwise sane programs. – Richard Hodges Dec 20 '15 at 20:20
  • @Jarod42 I'm not sure I understand the question. Can you elaborate what you mean by "represent the *graph*"? Do you refer to a visual or mathematical representation, or something else entirely? – user4815162342 Dec 21 '15 at 15:25
  • 1
    I mean how (programmatically) you expose that `node` owns other `shared_ptr` to check cycle. (I may miss how GC works for your suggestion BTW). – Jarod42 Dec 21 '15 at 18:03
  • @Jarod42 In the Python implementation, each natively implemented type has a member function that visits all strongly referenced objects. (A different function clears the references.) In C++ this would presumably be implemented by inheriting from a base class template whose template arguments specify the visit and clear functions. (A convenience CRTP-style base could be provided that simply calls appropriately-named arguments on the inherited class.) – user4815162342 Dec 21 '15 at 20:13
  • 1
    @user4815162342: It's unclear if you're asking about whether such a pointer type would be part of a GC or whether it would be used in place of a GC. You seem to go back and forth on it in your post. You claim that it wouldn't be non-deterministic, since the objects would be destroyed once the last non-cyclic reference is destroyed. Yet you always refer to such destruction as "collection". So, are you asking about a library extension/modification of `shared_ptr`... or a **compiler feature**? – Nicol Bolas Jan 27 '16 at 05:38
  • @NicolBolas I am asking strictly about a (hypothetic) library feature. The term "collection" is here synonymous to "cycle-breaking" - after all, a cycle breaker must do work in addition to normal destructor invocation. This is a limited sort of garbage collection, with the restriction that it requires no compiler support and is restricted to objects that are (a) targets of shared_ptr, and (b) are marked as ones that can participate in cycles. – user4815162342 Jan 27 '16 at 07:41
  • @NicolBolas The claim that it wouldn't be non-deterministic (as this term is often applied in the GC context) refers to the hypothetical cycle breaker's use of normal destructor mechanism to delete objects. The only reason the collector must exist is to detect cycles and ask the objects to break them manually when appropriate. The destruction work is done by objects themselves, as usual. – user4815162342 Jan 27 '16 at 07:42
  • 1
    You're title is asking for "a cycle breaker", but you specifically _not_ asking for a way to break cycles. But your question is asking for a fundamental reason why cycles cannot be collected automagically in C++ (although you assume that this has to be done by breaking the cycle first). Please align title and question. – Rumburak Jan 27 '16 at 08:14
  • @Rumburak The title is intentionally asking for a cycle breaker, not "a way to break cycles" (with a different design, etc.). The question itself then elaborates on what is being meant by a cycle breaker/collector, mostly through comparison with similar technology already deployed in scripting languages. – user4815162342 Jan 27 '16 at 08:33
  • Re-read the only real question (search for `?`). It is not about breaking cycles. It is about cleaning them up. You're just assuming that this has to be done by breaking cycles. I'd claim that this is a [XY-Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). – Rumburak Jan 27 '16 at 08:51
  • @Rumburak Cleaning cycles without breaking them up would no longer be a library feature, but a full-featured GC, which is not the topic of the question, as explained in the next paragraph. The question intentionally refers to existing technologies, mentioned in the first paragraph. Referring to well-established practice does not constitute an xy-problem. – user4815162342 Jan 27 '16 at 09:21
  • @NicolBolas I have rolled back your edit. Changing "collector" to "detector" is incorrect because a cycle breaker doesn't merely detect cycles, it takes steps to make sure the objects are destroyed afterwords. Your edit also added the sentence, *Cycle detection only happens when a `shared_ptr`'s reference count drops to zero* This is incorrect, as the whole point of cycle detection/breaking is that the reference count of `shared_ptr`'s that participate in cycles *doesn't* drop to zero. If you would like me to modify the question, feel free to propose the changes in a comment. – user4815162342 Jan 27 '16 at 15:38
  • 1
    With the bounty about to expire, I am awarding it to the highest-upvoted answer posted before the bounty. While some of the answers posted during the bounty period do add insight into the problem, they show a lack of understanding of the subject matter, and especially of existing implementations of cycle breakers. The question is of course still open; if a good answer comes along, I will gladly accept it. – user4815162342 Feb 02 '16 at 12:21

5 Answers5

13

There are a number of issues to be discussed here, so I've rewritten my post to better condense this information.

Automatic cycle detection

Your idea is to have a circle_ptr smart pointer (I know you want to add it to shared_ptr, but it's easier to talk about a new type to compare the two). The idea is that, if the type that the smart pointer is bound to derives from some cycle_detector_mixin, this activates automatic cycle detection.

This mixin also requires that the type implement an interface. It must provide the ability to enumerate all of the circle_ptr instances directly owned by that instance. And it must provide the means to invalidate one of them.

I submit that this is a highly impractical solution to this problem. It is excessively fragile and requires immense amounts of manual work from the user. And therefore, it is not appropriate for inclusion in the standard library. And here are some reasons why.

Determinism and cost

"It would introduce non-deterministic destruction of cyclic shared_ptr objects." Cycle detection only happens when a shared_ptr's reference count drops to zero, so the programmer is in control of when it happens. It would therefore not be non-deterministic. Its behavior would be predictable - it would destroy all currently known cycles from that pointer. This is akin to how shared_ptr destructor destroys the underlying object once its reference count drops to zero, despite the possibility of this causing a "non-deterministic" cascade of further destructions.

This is true, but not in a helpful way.

There is a substantial difference between the determinism of regular shared_ptr destruction and the determinism of what you suggest. Namely: shared_ptr is cheap.

shared_ptr's destructor does an atomic decrement, followed by a conditional test to see if the value was decremented to zero. If so, a destructor is called and memory is freed. That's it.

What you suggest makes this more complicated. Worst-case, every time a circle_ptr is destroyed, the code will have to walk through data structures to determine if there's a cycle. Most of the time, cycles won't exist. But it still has to look for them, just to make sure. And it must do so every single time you destroy a circle_ptr.

Python et. al. get around this problem because they are built into the language. They are able to see everything that's going on. And therefore, they can detect when a pointer is assigned at the time those assignments are made. In this way, such systems are constantly doing small amounts of work to build up cyclic chains. Once a reference goes away, it can look at its data structures and take action if that creates a cyclical chain.

But what you're suggesting is a library feature, not a language feature. And library types can't really do that. Or rather, they can, but only with help.

Remember: an instance of circle_ptr cannot know the subobject it is a member of. It cannot automatically transform a pointer to itself into a pointer to its owning class. And without that ability, it cannot update the data structures in the cycle_detector_mixin that owns it if it is reassigned.

Now, it could manually do this, but only with help from its owning instance. Which means that circle_ptr would need a set of constructors that are given a pointer to its owning instance, which derives from cycle_detector_mixin. And then, its operator= would be able to inform its owner that it has been updated. Obviously, the copy/move assignment would not copy/move the owning instance pointer.

Of course, this requires the owning instance to give a pointer to itself to every circle_ptr that it creates. In every constructor&function that creates circle_ptr instances. Within itself and any classes it owns which are not also managed by cycle_detection_mixin. Without fail. This creates a degree of fragility in the system; manual effort must be expended for each circle_ptr instance owned by a type.

This also requires that circle_ptr contain 3 pointer types: a pointer to the object you get from operator*, a pointer to the actual managed storage, and a pointer to that instance's owner. The reason that the instance must contain a pointer to its owner is that it is per-instance data, not information associated with the block itself. It is the instance of circle_ptr that needs to be able to tell its owner when it is rebound, so the instance needs that data.

And this must be static overhead. You can't know when a circle_ptr instance is within another type and when it isn't. So every circle_ptr, even those that don't use the cycle detection features, must bear this 3 pointer cost.

So not only does this require a large degree of fragility, it's also expensive, bloating the type's size by 50%. Replacing shared_ptr with this type (or more to the point, augmenting shared_ptr with this functionality) is just not viable.

On the plus side, you no longer need users who derive from cycle_detector_mixin to implement a way to fetch the list of circle_ptr instances. Instead, you have the class register itself with the circle_ptr instances. This allows circle_ptr instances that could be cyclic to talk directly to their owning cycle_detector_mixin.

So there's something.

Encapsulation and invariants

The need to be able to tell a class to invalidate one of its circle_ptr objects fundamentally changes the way the class can interact with any of its circle_ptr members.

An invariant is some state that a piece of code assumes is true because it should be logically impossible for it to be false. If you check that a const int variable is > 0, then you have established an invariant for later code that this value is positive.

Encapsulation exists to allow you to be able to build invariants within a class. Constructors alone can't do it, because external code could modify any values that the class stores. Encapsulation allows you to prevent external code from making such modifications. And therefore, you can develop invariants for various data stored by the class.

This is what encapsulation is for.

With a shared_ptr, it is possible to build an invariant around the existence of such a pointer. You can design your class so that the pointer is never null. And therefore, nobody has to check for it being null.

That's not the case with circle_ptr. If you implement the cycle_detector_mixin, then your code must be able to handle the case of any of those circle_ptr instances becoming null. Your destructor therefore cannot assume that they are valid, nor can any code that your destructor calls make that assumption.

Your class therefore cannot establish an invariant with the object pointed to by circle_ptr. At least, not if it's part of a cycle_detector_mixin with its associated registration and whatnot.

You can argue that your design does not technically break encapsulation, since the circle_ptr instances can still be private. But the class is willingly giving up encapsulation to the cycle detection system. And therefore, the class can no longer ensure certain kinds of invariants.

That sounds like breaking encapsulation to me.

Thread safety

In order to access a weak_ptr, the user must lock it. This returns a shared_ptr, which ensures that the object will remain alive (if it still was). Locking is an atomic operation, just like reference incrementing/decrementing. So this is all thread-safe.

circle_ptrs may not be very thread safe. It may be possible for a circle_ptr to become invalid from another thread, if the other thread released the last non-circular reference to it.

I'm not entirely sure about this. It may be that such circumstances only appear if you've already had a data race on the object's destruction, or are using a non-owning reference. But I'm not sure that your design can be thread safe.

Virulence factors

This idea is incredibly viral. Every other type where cyclic references can happen must implement this interface. It's not something you can put on one type. In order to get the benefits, every type that could participate in a cyclical reference must use it. Consistently and correctly.

If you try to make circle_ptr require that the object it manages implement cycle_detector_mixin, then you make it impossible to use such a pointer with any other type. It wouldn't be a replacement of (or augmentation for) shared_ptr. So there is no way for a compiler to help detect accidental misuse.

Sure, there are accidental misuses of make_shared_from_this that cannot be detected by compilers. However, that is not a viral construct. It is therefore only a problem for those who need this feature. By contrast, the only way to get a benefit from cycle_detector_mixin is to use it as comprehensively as possible.

Equally importantly, because this idea is so viral, you will be using it a lot. And therefore, you are far more likely to encounter the multiple-inheritance problem than users of make_shared_from_this. And that's not a minor issue. Especially since cycle_detector_mixin will likely use static_cast to access the derived class, so you won't be able to use virtual inheritance.

Summation

So here is what you must do, without fail, in order to detect cycles, none of which the compiler will verify:

  1. Every class participating in a cycle must be derived from cycle_detector_mixin.

  2. Anytime a cycle_detector_mixin-derived class constructs a circle_ptr instance within itself (either directly or indirectly, but not within a class that itself derives from cycle_detector_mixin), pass a pointer to yourself to that cycle_ptr.

  3. Don't assume that any cycle_ptr subobject of a class is valid. Possibly even to the extent of becoming invalid within a member function thanks to threading issues.

And here are the costs:

  1. Cycle-detecting data structures within cycle_detector_mixin.

  2. Every cycle_ptr must be 50% bigger, even the ones that aren't used for cycle detection.

Misconceptions about ownership

Ultimately, I think this whole idea comes down to a misconception about what shared_ptr is actually for.

"A cycle detector is unnecessary because cycles are not that frequent and they can be easily avoided using std::weak_ptr." Cycles in fact turn up easily in many simple data structures - e.g. a tree where children have a back-pointer to the parent, or a doubly-linked list. In some cases, cycles between heterogenous objects in complex systems are formed only occasionally with certain patterns of data and are hard to predict and avoid. In some cases it is far from obvious which pointer to replace with the weak variant.

This is a very common argument for general-purpose GC. The problem with this argument is that it usually makes an assumption about the use of smart pointers that just isn't valid.

To use a shared_ptr means something. If a class stores a shared_ptr, that represents that the class has ownership of that object.

So explain this: why does a node in a linked list need to own both the next and previous nodes? Why does a child node in a tree need to own its parent node? Oh, they need to be able to reference the other nodes. But they do not need to control the lifetime of them.

For example, I would implement a tree node as an array of unique_ptr to their children, with a single pointer to the parent. A regular pointer, not a smart pointer. After all, if the tree is constructed correctly, the parent will own its children. So if a child node exists, it's parent node must exist; the child cannot exist without having a valid parent.

With a double linked list, I might have the left pointer be a unique_ptr, with the right being a regular pointer. Or vice-versa; one way is no better than the other.

Your mentality seems to be that we should always be using shared_ptr for things, and just let the automatic system work out how to deal with the problems. Whether it's circular references or whatever, just let the system figure it out.

That's not what shared_ptr is for. The goal of smart pointers is not that you don't think about ownership anymore; it's that you can express ownership relationships directly in code.

Overall

How is any of this an improvement over using weak_ptr to break cycles? Instead of recognizing when cycles might happen and doing extra work, you now do a bunch of extra work everywhere. Work that is exceedingly fraglile; if you do it wrong, you're no better off than if you missed a place where you should have used weak_ptr. Only it's worse, because you probably think your code is safe.

The illusion of safety is worse than no safety at all. At least the latter makes you careful.

Could you implement something like this? Possibly. Is it an appropriate type for the standard library? No. It's just too fragile. You must implement it correctly, at all times, in all ways, everywhere that cycles might appear... or you get nothing.

Authoritative references

There can be no authoritative references for something that was never proposed, suggested, or even imagined for standardization. Boost has no such type, and such constructs were never even considered for boost::shared_ptr. Even the very first smart pointer paper (PDF) never considered the possibility. The subject of expanding shared_ptr to automatically be able to handle cycles through some manual effort has never been discussed even on the standard proposal forums where far stupider ideas have been deliberated.

The closest to a reference I can provide is this paper from 1994 about a reference-counted smart pointer. This paper basically talks about making the equivalent of shared_ptr and weak_ptr part of the language (this was in the early days; they didn't even think it was possible to write a shared_ptr that allowed casting a shared_ptr<T> to a shared_ptr<U> when U is a base of T). But even so, it specifically says that cycles would not be collected. It doesn't spend much time on why not, but it does state this:

However, cycles of collected objects with clean-up functions are problematic. If A and B are reachable from each other, then destroying either one first will violate the ordering guarantee, leaving a dangling pointer. If the collector breaks the cycle arbitrarily, programmers would have no real ordering guarantee, and subtle, time-dependent bugs could result. To date, no one has devised a safe, general solution to this problem [Hayes 92].

This is essentially the encapsulation/invariant issue I pointed out: making a pointer member of a type invalid breaks an invariant.

So basically, few people have even considered the possibility, and those few who did quickly discarded it as being impractical. If you truly believe that they're wrong, the single best way to prove it is by implementing it yourself. Then propose it for standardization.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Automatic cycle breaker would require help from the type itself, which is why it would need to be opt-in, a la `std::enable_shared_from_this`. Specifically, the type would need to provide methods to visit and clear the references without breaking encapsulation. The rest of your answer largely depends on the assumption that the collector will write into objects breaking the encapsulation. I will amend the question to make it explicit that this is not proposed. – user4815162342 Jan 27 '16 at 09:25
  • @user4815162342: I've modified my answer to make it more concise and focused, based on your various comments about what you want (comments that should be part of your question, BTW, since you clearly have a design in mind). – Nicol Bolas Jan 27 '16 at 17:02
  • 1
    @user4815162342: That's not how Stack Overflow works, and you've been a member of this site long enough to know that by now. Your updates to your question keep moving the goal-posts. You started by suggesting that destroying such a pointer could directly lead to the destruction of a cycle (when the last non-cyclic reference is destroyed). Now you say that it's some process that the user explicitly invokes. It seems to me...(continued) – Nicol Bolas Jan 27 '16 at 22:54
  • @user4815162342: ... that you keep trying to design *around* the problems that I come up with. While that might be OK on a forum, this isn't a forum; it's a Q&A site. If you want to brainstorm a way to incorporate cyclical reference detection into `shared_ptr`, fine, but this really is not the venue for that. Your question was why `shared_ptr` doesn't provide a particular implementation of this. I explained the cost of your particular implementation. I'm not interested in having a dialog over this on this site. – Nicol Bolas Jan 27 '16 at 22:56
  • The only relevant updates to the question clarified the terminology and rolled back your incorrect edit. It is not true that the edits have shifted the goalpost. I have now removed the addendum that specifically refers to your answer. – user4815162342 Jan 28 '16 at 11:23
  • The edited answer reflects an incorrect understanding of existing cycle detection technologies (details available [here](http://stackoverflow.com/revisions/34373460/11)). While parts of it, such as the section regarding thread safety and ownership, are relevant to the discussion, the rest is far from being authoritative, nor does it draw on credible and/or official sources. – user4815162342 Jan 28 '16 at 11:31
  • @user4815162342: "*It is not true that the edits have shifted the goalpost. I have now removed the addendum that specifically refers to your answer.*" It was that addendum that shifted the goalposts, since it added new details that specifically contradicted older ones. – Nicol Bolas Jan 28 '16 at 15:48
  • The addendum was only there to refute incorrect portions of your answer and was removed for the same reason. – user4815162342 Jan 28 '16 at 16:20
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/101916/discussion-between-nicol-bolas-and-user4815162342). – Nicol Bolas Jan 28 '16 at 16:29
  • @user4815162342: Authoritative references (to the extent that such references exist) added. – Nicol Bolas Jan 28 '16 at 17:02
  • The rest of answer does not draw its content from those references. The quoted paragraph assumes that the cycle breaker will directly invoke an object's destructor, which is not how the modern designs work - it is the object itself that breaks the cycle, when prompted by the collector. (Analogous misconceptions about the design and the "breaking of encapsulation" that follows from them are repeated throughout your answer.) – user4815162342 Jan 29 '16 at 12:15
  • @user4815162342: There's a chat discussion for this. – Nicol Bolas Jan 29 '16 at 14:10
  • "_circle_ptr instances directly owned_" What about: another there is another object of type class `U` managed by class `T` through a `unique_ptr` in `T`. `U` contains a `circle_ptr`. Is it "directly owned"? – curiousguy Jun 20 '18 at 22:00
7

std::weak_ptr is the solution to this problem. Your worry about

a tree where children have a back-pointer to the parent

can be solved by using raw pointers as the back-pointer. You have no worry of leakage if you think about it.

and your worry about

doubly-linked list

is solved by std::weak_ptr or a raw one.

Paul Evans
  • 27,315
  • 3
  • 37
  • 54
  • 1
    A weak or raw pointer solves the problem only in cases when it is clear who "owns" the object and who just "observes" it. In many real-world scenarios this is not the case. A simple example that comes to mind is a graph where each vertex contains a list of connected vertices, which may or may not point back to the original vertex. (Tree and doubly-linked list are meant as examples of cyclic structures cropping up easier than one originally imagines.) – user4815162342 Dec 19 '15 at 18:43
  • @user4815162342 Yes, it can get *very* complicated when one has to manage all these things by hand. That's the trade-off. There's no free lunch. If you want the control and hence the efficiency gains you pay a price. If you want all this sorted out for you (by say, a virtual machine with automated garbage collection - *eg* JVM) then you loose that control and hence some of the benefits. That's also why the STL (with over a decade of testing and hard big-`O` constraints) is so powerful and so often used over any half-baked home-spun alternatives. – Paul Evans Dec 19 '15 at 20:04
  • No argument here - there are always trade-offs. Still, that doesn't answer the question which is specifically about whether there is a fundamental reason why a cycle breaker wouldn't work for `std::shared_ptr`. In other words, it is not obvious that one has to choose between one extreme, a full-fledged "virtual machine with automated garbage collection", and the other, the current solution where cycles created by shared pointers simply leak. The Python approach demonstrates a middle ground which might well be relevant for C++ smart pointers. – user4815162342 Dec 19 '15 at 23:32
  • I'm currently implementing a graph structure and I've decided its nodes and edges to keep only weak pointers to each other. In order to avoid falling nodes' and edges' reference counters to zero, I maintain two std::vectors within a graph which hold strong pointers to each active node and edge, respectively. So, unless someone took a node or edge for further examination, its reference count will always be 1. – Minor Threat Dec 19 '15 at 23:36
  • @user4815162342 Then you might consider adapting a `shared_ptr` to intrusively report its host's `shared_ptr` target to anything that targets its host with this adapted `shared_ptr`. Then use common loop detection algorithms at some point after attachment to detect and correct loops. – Paul Evans Dec 20 '15 at 00:04
  • @PaulEvans What you describe would be a core of a cycle collector. The question was about reasons why this was not already implemented or at least proposed. Being present in python and PHP, the technique is well known and seems obviously useful, at least as an opt-in feature. – user4815162342 Dec 20 '15 at 10:25
  • @MinorThreat That is a straightforward strategy, and simple to implement to boot, with only a slight cost of maintaining the additional vector. As a general solution, it requires being able to distinguish "owning" from "observing" pointers, and never having a cycle among the former. – user4815162342 Dec 20 '15 at 19:52
  • @user4815162342 This might be a very good idea for a [boost](http://www.boost.org) library! (If you don't already know that's how new libraries are excepted into the C++ standard) Haven't checked if someone's already working on this. More than happy to continue this discussion about this idea with you – Paul Evans Dec 20 '15 at 20:41
  • How do you implement lists with `std::weak_ptr`? – curiousguy Jun 20 '18 at 22:01
1

I believe that the answer to your question is that, contrary to what you claim, there is no efficient way to automatically handle cyclic references. Checking for cycles must be carried out every time a "shared_ptr" is destroyed. On the other hand, introducing any deferring mechanism will inevitably result in a undetermined behavior.

S. K.
  • 433
  • 3
  • 9
  • That's a valid point, a completely automatic cycle breaker invoked on every `shared_ptr` destruction would be slow. But did you consider a cycle breaker that needs to be invoked manually? (This is mentioned in the question.) I don't see why that would result in non-deterministic behavior. – user4815162342 Jan 29 '16 at 12:14
  • 1
    @user4815162342 As you know, in C++ on rely on destructors to cleanup and free resources. If an object is destructed, I am sure that the resources it held are now free and can be accessed. This guarantee is lost with deferred destruction and this remains true even if the process is started manually. Secondly, manual control has its own issues. One has has to decide when to start the process, and I am not sure this decision is easier than that of choosing when to use weak_ptr for example. – S. K. Jan 29 '16 at 12:55
  • Deferred destruction is always a problem, but with current `shared_ptr`, you get *no* destruction for cycles. The option of explicit cleanup seems better than having no cleanup whatsoever. I agree that the choice of when to run cleanup is a problem in practice, especially given that it is hard to predict the time it takes to run it. – user4815162342 Jan 29 '16 at 14:21
  • Another possible problem is the scope of the cleanup. Since `shared_ptr` and the objects they manage exist outside of an explicit context, a cleanup would need to be *global*. That does not sound like C++, as very few things in C++ are global - the heap managed by `new` and `delete` is the only one that comes to mind. – user4815162342 Jan 29 '16 at 14:22
  • It is possible to make as efficient GC as Java has, and have it resolve the cycles. Without language extension, it would not be as simple as shared_ptr though. It would require the shared_ptr to know about a GC instance running behind the scene, and also there need to be method provided to add an explicit reference from one shared ptr to another (to let the GC know how shared pointers reference each other). The class would be uglier, and heavier. You could also limit object types to be stored in the shared_ptr to classes that inherit from same base. – 0kcats Jan 29 '16 at 20:42
  • 1
    @user4815162342 The statement "explicit cleanup seems better than having no cleanup whatsoever" is a bit misleading, since your are comparing a broken program that missuses shared_ptr to a program were reclaiming inaccessible cyclic pointers is started manually. A program that uses shared_ptr / weak_ptr properly does not require any (extra) clean up. What we should compare is the effort/inconvenience caused by using the two types of pointers (shared_ptr and weak_ptr) against the effort/inconvenience of using a single type of pointers with manually started collection. – S. K. Jan 29 '16 at 21:54
  • @0kcats The additional metadata, such as pointers to prev/next container object, is only in the control block, and then only for types that opt-in to the functionality. You wouldn't pay for what you don't use. (This is how existing implementations work as well, except they use intrusive reference counting which conflates the object and the control block.) – user4815162342 Jan 30 '16 at 19:43
  • @S.K. I see your point. Given previous StackOverflow questions about the topic, I would argue that using weak_ptr to break cycles is more error-prone than usually acknowledged. Also, the failure mode is silent leaks (of memory or, worse, other resources), which gives off a bad design smell, akin to the design smell of using raw pointers where one had to remember to `delete` it lest the code leaks. – user4815162342 Jan 30 '16 at 21:39
0

The shared_ptr was not made for automatic reclamation of circular references. It existed in the boost library for some time before being copied to STL. It is a class that attaches a reference counter to any c++ object - be it an array, a class, or an int. It is a relatively lightweight and self-sufficient class. It does not know what it contains, with exception that it knows a deleter function to call when needed.

Al this cycle resolution requires too much heavy code. If you like GC, you can use another language, that was designed for GC from the beginning. Bolting it on via STL would look ugly. A language extension as in C++/CLI would be much nicer.

0kcats
  • 672
  • 5
  • 16
  • You don't need a language extension like C++/CLI. C++14 _as defined_ supports GC. C++14 _as commonly implemented_ does not, though. – MSalters Jan 27 '16 at 14:30
  • 1
    @MSalters C++ does not have a GC. It has a mean to support it. And if the support is provided it breaks compatibility. I wish this never going to be implemented as designed. – 0kcats Jan 27 '16 at 20:31
  • Not just "heavy", basically impossible to integrate with existing pointers. Raw pointers are not going away. "Smart pointers" are just a fancy name for a wrapper around a raw pointers, and manipulating it is still the goal. I don't understand why C++ programmers/dreamers insist on replacing raw pointers with smart pointers when the smart pointer is just a way to access the raw pointer. – curiousguy Jun 22 '18 at 00:33
-1

By reference counting what you ask for is impossible. In order to identify a circle one would have to hold identification of the references to your object. That is easy in memory managed languages since the virtual machine knows who references whom.

In c++ you can only do that by holding a list of references in the circular pointer of e.g. UUID that identifies the object referencing your resources. This would imply that the uuid is somehow passed into the structure when the object is acquired, or that the pointer has access to that resources internals.

These now become implementation specific, since you require a different pointer interface e.g copy and assignment could not be implemented as raw pointers, and demand from every platform to have a uuid source, which cannot be the case for every system. You could of course provide the memory address as a uuid .

Still to overcome the copy , and proper assignment without having a specialized assign method would probably require a single source that allocates references. This cannot be embedded in the language, but may be implemented for a specific application as global registry.

Apart from that, copying such a larger shared pointer would incurr larger performance impact, since during those operations on would have to make lookups for adding , removing, or resolving cycles. Since , doing cycle detection in a graph, from a complexity point of view, would require to traverse the graph registered and apply DFS with backtracking, which is at least proportional to the size of references, I don't see how all these do not scream GC.

g24l
  • 3,055
  • 15
  • 28
  • Existing cycle breaker implementations are not managed in the Java sense. (True managed-VM systems tend not to use reference counting to begin with.) In CPython, both the core interpreter and especially the C extensions create objects that are not managed by a "VM" (except for allocation/deallocation through an agreed-upon reference counting scheme akin to an intrusive version of `shared_ptr`). The cycle breaker works by having the types that take part in it cooperate. This sounds implementable in C++ in principle, but the fact that it wasn't implemented is what prompted the question. – user4815162342 Jan 30 '16 at 19:34
  • @user4815162342 , I did not say that managed languages do reference counting to start with. Also did I say that you need to do reference counting? I only said that you have to have a registry of who is referring to your object. Somehting that can only be done in a managed language , since there is no other way to guarantee across platforms that referee id to your object. C++ is not and will not be a managed language. Just because you can write in C++ does not imply that it should be a c++ language feature. Let's not forget that c++ has been used to code other nice programming languages. – g24l Jan 30 '16 at 22:24
  • Existing implementations don't require a registry of who is referring to your object, they make do with objects enumerating who they refer to. (They ask the type to provide the information, so no help from the compiler or VM is needed.) – user4815162342 Jan 30 '16 at 22:42
  • @user4815162342 yes true, because they are managed and restrict a number of things, such as knowing who is where. That is not the c++ way. – g24l Jan 30 '16 at 23:06
  • Take a look at those implementations, and you will see that they are not managed. Nothing prevents Python objects created by C extensions from e.g. using pointer masking, or referring to non-Python objects, and still participating in gc. (The gui toolkit wrappers do this regularly.) – user4815162342 Jan 31 '16 at 07:43
  • 1
    @user4815162342 , but python is a managed language... and python objects created by C extensions are still PyObjects which are managed. Pointer masking can include passing object identification information. It is just that without that information, which can only be supported in a managed language, the task of detecting cycles becomes dubious if not impossible. – g24l Jan 31 '16 at 19:26
  • 1
    I am not sure what you mean by managed here. How are PyObjects any more "managed" than the objects owned by `shared_ptr`? – user4815162342 Jan 31 '16 at 19:32
  • 1
    @user4815162342 hm, I though you would know, but "what do you mean managed" is not the appropriate question, because it will soon end up in "why do you say that X is managed and Y is not managed" ... and the Mr Feynman will explain : https://www.youtube.com/watch?v=MO0r930Sn_8, so I am not answering the first question. The second though, I have already answered. – g24l Jan 31 '16 at 19:55
  • 1
    You answered neither question, for the simple reason that there *is* no "managing" magic that python does behind the scenes. It's raw pointers and reference counting all the way down. – user4815162342 Jan 31 '16 at 20:08
  • @user4815162342 you posed no question at all. Raw pointers indeed , but you have to know which raw pointer it is, that is why a managed language it is. AFAIK there is no graph algorithm that works without identification for clique/cycle detection . – g24l Jan 31 '16 at 21:15
  • 1
    That's exactly the point: in Python, the crucial information - a list of referents - is provided by a method on the type, not by an all-knowing "managing" VM. The same could be done for `shared_ptr`, and in fact the same is already done to invoke the type's destructor. – user4815162342 Jan 31 '16 at 21:44
  • 1
    @user4815162342 , that is exactly the point in the answer. Either you have it on the reference, but then you need a mechanism to provide identification, or you embed that in a mechanism as well. Either case has platform limitations, and imposes significant permormance considerations . E.g. You have to do DFS, or copy that information around. You end up writing a managed language. Which is not c++ purpose. – g24l Jan 31 '16 at 22:51
  • The answer is incorrect in that it claims that additional data would have to be on `shared_ptr`, or that its copying etc would become slower - none of that is the case, the additional data (consisting of three pointers in the Python case) is in the control block, and is opt-in. The DFS you refer to is just performed by a library function, which bears little resemblance to how managed languages work. – user4815162342 Feb 01 '16 at 08:53
  • @user4815162342 , you should not make the assumption that managed language implies VM. Having said that, certainly we can talk about optimizations and how to do this better, but you are skipping the point that you have to do extra accounting. I am not saying it is not doable with extra accounting, but that it requires an mechanism to identify objects. Every reference to the object has to be accounted, and every time that reference is removed has to be checked e.g. with DFS. This would make deleters run slowly. This certainly cannot be done by just reference counting. – g24l Feb 01 '16 at 10:02
  • @user4815162342 , this could be a nice library extension, but a `cycle_ptr` would not possibly be a lightweight self-contained object as is the `shared_ptr` or `weak_ptr` . Because, having written the mechanism for object identification is convoluted with managing the objects. That makes it a managed language and comes very close to a GC. I think that your effort does not show that such a mechanism would make it largelly different from a GC. At any rate, once I get the time I'll try to add references to this answer, but I have to say that I am quite a bit tight on time. – g24l Feb 01 '16 at 10:06
  • What you are saying is refuted by the existing implementations, which *are* using pure reference counting and occasionally invoke the cycle breaker independently of the refcounting primitives. As I said, all the metainformation is in the control block, and (their equivalent of) shared_ptr is no more expensive than without the cycle breaker. Since I have explained all this before, and we are now running in circles (if you excuse the pun), I will end my side of this part of the discussion here. If you are genuinely interested about this approach, feel free to look into Python's cycle collector. – user4815162342 Feb 01 '16 at 10:17
  • @user4815162342 , nothing refuted, they keep an internal graph of references has a cycle collector ( actually a GC : https://docs.python.org/2/library/gc.html) , which is exactly the mechanism that I am referring to. The mechanism cannot exist in c++ as a standard language feature, because it would require c++ to become a managed language; it is not. Python is a managed language and the analogy does not apply. – g24l Feb 01 '16 at 10:37
  • "_That is easy in memory managed languages since the virtual machine knows who references whom_" How many of these "managed" systems have anything like to C++ destructors? (No, Java finalizers that may run asynchronously at any time or not a lot all don't count. Fun fact: in the first version of Java, finalizers could run at any time, even in the middle of a method call on that object; the only guarantee was that a finalizer could not start before the initialization of the object.) – curiousguy Jun 20 '18 at 20:15