10

As I understand, garbage collector in c# will put all objects of a class into finalization queue, as soon as I implement destructor of the class. When I was reading documentation for GC.Suppresfinalize, it mentions that object header already has a bit set for calling finalize.

I am wondering that why the implementers of GC had to put all objects in a queue, and delay the freeup of memory by 1-2 cycles. Could not they just look at the bit flag while releasing memory, then call finalize of the object and then release memory?

No doubt I am an idiot and I not able to understand the working of GC. I am posing this question just to improve my understanding or fill the missing gap in my knowledge

EDIT : If the bit flag is for suppressfinalize, GC implementers could have added another flag in object header for this purpose, no?

Pragmateek
  • 13,174
  • 9
  • 74
  • 108
paseena
  • 4,207
  • 6
  • 32
  • 51
  • 1
    I think @R0MANARMY means which GC implementation. There is more than one implementation of the .NET framework in the universe. In any case, it shouldn't matter which implementation because the question is regarding why *any* GC implementation would implement a finalizer queue. – kqnr Apr 11 '11 at 23:49
  • I meant that there are [3 (it looks like)](http://blogs.msdn.com/b/maoni/archive/2004/09/25/234273.aspx) implementations. It may or may not be handled the same in all of them (probably is though). – Roman Apr 11 '11 at 23:50

5 Answers5

12

So it can run in a different thread and thus keep from blocking the main GC thread.

You can learn a lot about the GC from this MSDN article.

jason
  • 236,483
  • 35
  • 423
  • 525
4

There is a great explanation here

What are the Finalizer Queue and Control+ThreadMethodEntry?

Essentially the reasoning is that it may not always be ideal for the GC to have to wait on finalizer code to execute, so queuing finalizers allows finalization to be deferred until a time when it's more convenient.

Community
  • 1
  • 1
kqnr
  • 3,596
  • 19
  • 17
1

@Jason: this is true for the f-reachable queue. But IMHO it does not explain why there is the finalization-queue itself.

My guess is that the finalization-queue is there to add another information that helps the GC to distinguish between all the possible states of an object life-cycle.

The finalization flag in the object's header says "the object needs to be finalized" or "the object does not need to be finalized" but it does not say if the finalization has already occurred.

But to be honest I don't grasp why it's needed in the current finalization process implementation.

Indeed, here is the naive workflow I imagine possible without the finalization-queue:

  • when creating the object, if it has a finalizer, the GC sets the finalization flag;
  • if later SupressFinalize is called then the flag is zeroed;
  • now let's jump to when the GC collects the object, which is not referenced from anywhere: if the finalization flag is set then the GC puts a reference to the object into the f-reachable queue and lets the finalization thread operates;
  • later the finalization thread dequeues the reference, resets the finalization flag and runs the finalizer;
  • if the object wants to be refinalized later it could ReRegisterForFinalize to set the finalization flag again;
  • later the GC collects the object again: if the finalization flag is not set it knows there is nothing to do and then frees the object memory;
  • if the finalization flag is set the GC enqueues again a reference to the object into the f-reachable queue and there we go again for another round;
  • at some point in time the object is happy, completes the finalization and is collected; or the app-domain or process is shutdown and memory is freed anyway.

So seems like in these scenarios there is no need for a finalization-queue, only the finalization flag is useful.

One possible reason would be that from a conceptual point of view there might be a rule like: "an object is collected if and only if it is not referenced from any root". So not having a finalization queue, and basing the decision to collect an object on the object state itself, checking the finalization flag, is not compatible with this rule.

But really I don't think the GC implementation is based on the dogmatic application of such theoretical rules but only on pragmatic choices; so it's obvious I'm missing some key scenarios where the GC needs the finalization queue to know what to do when collecting an object, but which ones?

Pragmateek
  • 13,174
  • 9
  • 74
  • 108
  • You are presupposing the the GC looks at every object it collects. That isn't necessary, and making it efficient would require adding an extra four bytes to every object's header. – supercat Oct 10 '13 at 20:57
  • @supercat: sorry but I don't get your point; for me the GC has to "look at" the object it collects, at least to know if it must push them in the f-reachable queue or if it can delete their memory. And why four bytes? for storing a hash? because for a binary flag I imagine there is still some free bits in the object's header. So could you please elaborate on these points? Thanks. :) – Pragmateek Oct 10 '13 at 21:28
  • 1
    Every object that has a finalizer must be looked at before it is collected, but if the GC has a list of all the objects that have finalizers it *only* has to look at things on that list. Every live object with a finalizer will have to be looked at twice per GC cycle, but dead objects won't be examined at all. – supercat Oct 11 '13 at 00:22
  • @supercat: for me it does not change anything: when the GC has determined that an object is not referenced anymore (I guess it's what you call a "dead object") then it must determine if the object has to be finalized: it only has to look at the object flag to know what to do, no need for the . For me the only advantage of keeping a separate collection would be the possibility for the GC to execute special collection cycles where it only checks the objects in the finalization queue to determine if it can collect them; but that does not seem really useful too. – Pragmateek Oct 11 '13 at 13:16
  • @supercat: and just to be sure does the finalization flag is a reality because **ECMA-335** does not seem to specify it. So I guess this is an implementation detail of the Microsoft .Net implementation. Without this flag this is obvious why the finalization queue is needed. – Pragmateek Oct 11 '13 at 13:21
  • In x86, each object is stored in memory as a four-byte type identifier, a four-byte value which (depending upon the upper bit) either includes a small amount of supplemental information including some flags and its `GetHashCode` value, or an index to a record which contains more supplemental information along with a copy of the data the word would have contained. It does not contain any information about the object's size. While would be possible to enumerate all allocated objects on the heap by starting with a pointer to the first object, determining its size, advancing the pointer... – supercat Oct 11 '13 at 16:42
  • ...to the next object, determining its size, etc. that would be very slow; the GC doesn't do that. Instead, it moves all the objects it can find from the part of the heap it's collecting and then erases that block of memory without caring about what was there. From the GC's perspective, there's no such thing as an object to which no reference exists. If an object isn't finalizable and no `WeakReference` to it exists, the object *ceases to exist* as soon as the last reference gets overwritten. The only time the GC causes an object to cease to exist is when it's the target of... – supercat Oct 11 '13 at 16:46
  • ...something like a `WeakRefeference`. In that case, the GC is responsible for blanking out the `WeakReference`, and it is the blanking out of the `WeakReference` which cases the item to vanish. Otherwise, while the GC has to run to reclaim the memory that was used by an abandoned object, objects cease to exist the moment the last reference gets overwritten. Were it not for the finalizer queue, the GC couldn't look the "needs finalization" flags of abandoned objects because by the time the GC ran they effectively wouldn't exist. – supercat Oct 11 '13 at 16:51
  • @supercat : *"there's no such thing as an object to which no reference exists"* AFAIK the .Net GC is a mark-and-sweep GC, so it precisely traces the references and marks all the objects it can reach from the roots as "live", before cleaning up the "dead" objects. *"objects cease to exist the moment the last reference gets overwritten"* *"they effectively wouldn't exist"* Objects does not vanish in the luminiferous aether once they're not referenced anymore, they're just a bunch of memory that sits here waiting for the GC to sweep them. – Pragmateek Oct 11 '13 at 22:12
  • And when the GC is going to sweep an object it can check a flag in the object's header to know if it can directly mark the memory the object occupies as free or if it needs to push it in the f-reachable queue. – Pragmateek Oct 11 '13 at 22:13
  • 1
    Unlike some memory-management systems which keep a linked list of allocations and free memory areas, .NET generally does not. If the last reference to an 32-byte object at address 0x12345678 gets deleted, the bytes from 0x12345678-0x12345697 won't get overwritten until the next GC, but *nothing* will read those addresses before the GC blanks them out. A memory-scan utility which was run before that would be able, with some effort, to determine that 0x12345678 has the first byte of an object (rather than being a byte within a larger one), but... – supercat Oct 12 '13 at 04:41
  • 1
    ...the amount of effort that would be required to located the beginning of every abandoned object in the heap would greatly exceed the effort required to keep a list of things that need to be examined before they're abandoned, move everything on that list to Gen1, and sweep everything in Gen0. – supercat Oct 12 '13 at 04:43
  • @supercat : hum that looks like a good reason. Thanks for pointing this out. So that means that for each collection, instead of checking if each "dead" object is in the finalization-queue, the GC will rather check if each object in the finalization queue is "dead". I guess this is an O(n) process vs an O(n^2) one. Moreover I guess that the dead object are effectively removed from memory only during the compaction cycles of the GC where it has to make a full scan. Seems consistent indeed. Would be perfect if an (ex-)insider like Eric Lippert could confirm all this. :) – Pragmateek Oct 12 '13 at 13:35
  • Oh! But if the finalization list is indeed the right approach why an additional flag in the object's header? Maybe to avoid adding/removing objects from the list when the objects call SupressFinalize/ReRegisterForFinalize, and instead setting the bit which is faster; so when the GC wants to know if it must finalize the object it can check (IsInTheFinalizationList AND HasFinalizationFlagSet)? – Pragmateek Oct 12 '13 at 14:04
  • The total amount of work required to set a flag and then later have the GC scan through the finalization queue, see that the flag is set, and not process the object further, is probably less than the amount of work that would be required to remove the object from the finalization queue any other way, even if the system could magically find the entry in the queue; it's certainly certainly low enough that there's no sense adding some complicated data structure to allow `GC.SuppressFinalize` to locate the object in the queue. – supercat Oct 13 '13 at 01:24
1

The garbage-collector does not identify and examine garbage, except perhaps when processing the Large Object Heap. Instead, its behavior is like a that of a bowling-alley pinsetter removing deadwood between throws: the pinsetter grabs all the pins that are still standing, lifts them off the surface of the lane, and then runs the sweeper bar across the lane without regard for how many pins are on that surface. Sweeping out memory wholesale is much faster than identifying individual objects to be deleted. If 1% of objects have finalizers (the real number's probably even less), then it would be necessary to examine 100 object headers to find each finalizable object. Having a separate list of objects which have finalizers makes it unnecessary for the GC to even look at any garbage objects that don't.

supercat
  • 77,689
  • 9
  • 166
  • 211
1

It's desirable for garbage collection pauses to be as short as possible. To that end, running finalizers is usually deferred to a later time, when the frantic work of garbage collection is done. It is instead done in the background on a separate thread.

rlibby
  • 5,931
  • 20
  • 25
  • The issue isn't just one of keeping pauses short; to allow for the existence of arbitrarily-deep object graphs, many GCs re-purpose memory within objects to keep track of its progress. They do this in a way that can be undone by the time the GC is finished, but while the GC is running many objects will be in a state totally unsuitable for running normal code. – supercat Dec 16 '15 at 20:24