How does garbage collection collect self-referential objects?

Question

If an object is not referenced by any other, then it is subject to be collected by the .NET CLR garbage collector.

However, if objA references objB, objB references objC, and objC references back to objA, how does the garbage collector figure out that they (as a whole) can be collected?

.NET uses a [mark and sweep algorithm (http://stackoverflow.com/questions/2344240/what-is-relation-between-gc-finalize-and-dispose). — Pontus Gagge, Dec 13 '11 at 12:29
I don't know for sure but I would assume it uses some kind of tree type thing and anything that isn't connected to the main tree can never be accessed from the main tree (by which I am talking the code that is currently somewhere in the stack). I don't know the specifics but graph theory I'm sure will solve the problem. :) — Chris, Dec 13 '11 at 12:33

David Pfeffer · Accepted Answer · 2011-12-13T12:50:22.647

The CLR uses a technique known as mark-and-sweep.

As part of this technique, every object can be thought of as initially marked for collection. Then, the CLR goes through each accessible object, starting with your globals (static fields, etc.) as the roots, and clears the mark on each walkable object. It then sweeps the remaining marked objects.

Keep in mind that this "marking" is conceptual; in reality, the objects are most likely added to a collection-set.

In the case of looping self-referenced objects, no reference to the objects will be found from the application, and so the algorithm will never reach those objects to "unmark" them.

score 2 · Answer 2 · answered Dec 13 '11 at 12:32

2

GC has a list of all created objects. During garbarge process it starts from global roots (like static fields) and walk through each referenced object. Each object from the list of all which has not been hit can be destroyed.

If there is no way to hit objA, objB or objC, all these objects will be collected

answered Dec 13 '11 at 12:32

Novakov

3,055
1
16
32

@Guffa: How does it find the objects that need collecting then if it doesn't have a list of all the possibilities? The mark and sweep descriptions certainly imply that it does... – Chris Dec 13 '11 at 12:40
Chris: The GC doesn't actually "mark" objects; it is a conceptual construct. In reality, it is objects that the GC **can** access which get marked in the negative. In other words, think of an imaginary property that exists on all objects, called `mark`. `mark` is true by default, so the GC only needs to mark objects it **can** access false. – David Pfeffer Dec 13 '11 at 12:48
-1: So Guffa is correct; the GC does not have a list of all created objects. – David Pfeffer Dec 13 '11 at 12:49
Of course it has list of all created objects. Several lists in fact. They are called generations. One object (pointer) gets put and moved from one list to another during it's lifetime – Adrian Zanescu Dec 13 '11 at 12:50
1

@AZ: Those are the heap generations, and it's not a list of the objects, it's the actual objects. – Guffa Dec 13 '11 at 18:26
ok. if the GC does not have any lists of allocated pointers then something does not match. It runs by marking known objects and decides to keep them. Then on collection it has to dump all "unmarked" objects. If it has no knowledge of them it just has to consider that memory area as usable again as a blob. What about finalizers?. How can it decide what to run and where? So no i don't agree with your assumption. – Adrian Zanescu Dec 19 '11 at 13:29
@Guffa "No, the GC doesn't have a list of all created objects". Mark sweep requires a list of all allocated objects. The sweep phase reclaims all allocated objects that have not been marked. – J D May 21 '12 at 08:53
@Novakov I'm sorry to see your correct answer being plagued with completely wrong comments. – J D May 21 '12 at 08:54
@DavidPfeffer "The GC doesn't actually "mark" objects; it is a conceptual construct". On the basis of my own performance measurements, production mark-sweep and mark-region GCs are unlikely to do anything except maintain a mark bit for each allocated object. You can accumulate a set of marked objects and not have any overhead for unmarked objects but that is much slower and does not save memory. – J D May 21 '12 at 08:56
@DavidPfeffer "So Guffa is correct; the GC does not have a list of all created objects". No, Guffa is wrong in the context of mark-sweep. The GC needs pointers to unreachable objects in order to reclaim them. – J D May 21 '12 at 08:57
@AZ. "Of course it has list of all created objects. Several lists in fact. They are called generations". Mark sweep requires a list of all objects but earlier generations are not mark-sweep. They use copying and only require the ability to evacuate survivors, so they only need pointers to reachable objects and not all allocated objects. This lets them sweep all unreachable objects in constant time. – J D May 21 '12 at 08:59
@JonHarrop: "Mark sweep requires a list of all allocated objects." Yes, of course, but that list is not kept. It's created when needed. – Guffa May 21 '12 at 09:31
@JonHarrop: "The GC needs pointers to unreachable objects in order to reclaim them." No, it doesn't. The garbage collector doesn't need the objects, it only wants the memory area where the objects were, and that is simply all the used memory in the heap without any active references to it. If a certain memory area contained one or a thousand objects is irrelevant, the garbage collector doesn't care about the objects once they are unused. – Guffa May 21 '12 at 09:32
@Guffa "Yes, of course, but that list is not kept. It's created when needed". It is not possible to recreate the list of all allocated values from the set of reachable values. – J D May 21 '12 at 17:59
@Guffa "The garbage collector doesn't need the objects". I didn't say the garbage collector needs the objects. My point was that some GCs do keep lists of all currently-allocated values. My own GCs are examples of this. See, for example http://www.ffconsultancy.com/ocaml/hlvm/ – J D May 21 '12 at 18:59
@JonHarrop: You said "The GC needs pointers to unreachable objects in order to reclaim them.". It doesn't. – Guffa May 21 '12 at 19:18
@JonHarrop: Some kinds of GC, especially relocating ones, are simply indifferent to objects that were once allocated but don't have any live references. Instead, they operate on the simple principle that if everything of value that had been in a region of memory has been moved elsewhere, that region of memory can be regarded as containing nothing of value. If usage patterns fit well with a generational model, then for many objects that contain nothing but PODS and references to other GC objects, allocation will have been nothing more than a pointer add and compare,... – supercat Mar 16 '15 at 18:08
...and destruction will require no action whatsoever. Objects which survive will have to be copied once for each generation, and then once for every "full" GC cycle, but full GC cycles will be relatively rare. I've been toying with the idea of doing a C++ string-and-array class for small-RAM embedded systems. I suspect RAM overhead could be reduced far below that of the normal C++ string type. – supercat Mar 16 '15 at 18:12
@supercat: Other GC algorithms work differently but mark-sweep (as described by John McCarthy in 1958) has pointers to both reachable and unreachable objects. See http://www-formal.stanford.edu/jmc/recursive.pdf – J D Mar 17 '15 at 00:54
@JonHarrop: The term "mark-and-sweep" is applied (perhaps not totally correctly) to many kinds of collectors, including some whose sweeping operation totally ignores everything that unreferenced memory had contained. Such a GC design can reduce memory overhead compared with those that try to keep track of everything. The string pool in 1980s Microsoft BASIC interpreters used three byte string references (pointer plus length), and zero overhead in the pool itself; it put the length in the reference rather than the target to allow string variables to point into program code. – supercat Mar 17 '15 at 15:27
@JonHarrop: The GC algorithm MS implemented was pretty terrible, but having each string in the pool preceded by a byte which was known not to be a zero, and have an allocated length of at least three bytes, would have allowed an efficient garbage-collection algorithm, with essentially no change in memory overhead. I wonder when the first GC algorithms appeared that simply ignored garbage, and if decent ones came before or after bad ones? – supercat Mar 17 '15 at 15:33
@supercat: "including some whose sweeping operation totally ignores everything that unreferenced memory had contained". I wrote a GC that does that and the interesting thing is, for a significant amount of work to implement that optimisation, you get essentially no improvement whatsoever. – J D Mar 18 '15 at 12:39
@JonHarrop: The amount of work required to "ignore" garbage depends I think on how many disjoint free areas will be ready for use after collection. In relocating collectors that don't support pinning, the number will be *one*. Some of the more memory-thrifty relocating GC algorithms that do support pinning are time O(N^2) or even O(N^3) on the number of *pinned* objects, but if in expected usage scenarios that N would seldom exceed four or five that's not a problem. Of course, if *all* objects are pinned (non-relocatable) then such algorithms don't work so well. – supercat Mar 18 '15 at 15:59
@supercat: "In relocating collectors that don't support pinning, the number will be one". That's only half of the equation. You did a lot of work (copying) to make that number one. That work, in an of itself in the context of mark-sweep, outweighs the benefit you get from being able to deallocate contiguous blocks of memory in one go. FWIW, I regard mark-sweep as inherently non-moving. – J D Mar 18 '15 at 20:38
@JonHarrop: The .NET and Java collectors are often described as mark/sweep, but for most objects in .NET which are less than 85K will get copied around by the GC. The cost is minimized by dividing memory into "generations". New allocations go into a "generation 0" heap, which is about a meg. When that gets full, items in there to which references still exist get copied to the gen1 heap; older objects stay where they are. When the gen1 heap gets full, objects get copied from there to the gen2 heap, while older (gen2) objects stay where they are. Only when the gen2 heap gets full... – supercat Mar 18 '15 at 20:51
...is it necessary for the system to worry about gen2 objects. The fraction of total objects in the gen0 heap is sufficiently small, and the occurrence of gen2 collections is sufficiently rare, that overall performance ends up being quite good. Even if one tracks gaps in gen2, and tries to have gen1 objects fill those gaps, only a small fraction of the objects which are created will make it to gen2; those that disappear before that won't have to be tracked. – supercat Mar 18 '15 at 20:53

How does garbage collection collect self-referential objects?

2 Answers2