12

Today I have seen a piece of code that first seemed odd to me at first glance and made me reconsider. Here is a shortened version of the code:

if(list != null){
    list.Clear();
    list = null;
}

My thought was, why not replace it simply by:

list = null;

I read a bit and I understand that clearing a list will remove the reference to the objects allowing the GC to do it's thing but will not "resize". The allocated memory for this list stays the same.

On the other side, setting to null would also remove the reference to the list (and thus to its items) also allowing the GC to do it's thing.

So I have been trying to figure out a reason to do it the like the first block. One scenario I thought of is if you have two references to the list. The first block would clear the items in the list so even if the second reference remains, the GC can still clear the memory allocated for the items.

Nonetheless, I feel like there's something weird about this so I would like to know if the scenario I mentioned makes sense?

Also, are there any other scenarios where we would have to Clear() a list right before setting the reference to null?

Finally, if the scenario I mentioned made sense, wouldn't it be better off to just make sure we don't hold multiple references to this list at once and how would we do that (explicitly)?

Edit: I get the difference between Clearing and Nulling the list. I'm mostly curious to know if there is something inside the GC that would make it so that there would be a reason to Clear before Nulling.

DereckM
  • 274
  • 2
  • 11
  • As a general rule, setting it to `null` is what you should do usually. Once the GC realises that the `List` is unreachable - it will realise its contents (i.e. the underlying array and what it points to) are also unreachable. There are some scenarios in which clearing it may be worthwhile - for example if there are multiple references to the underlying `List` and you don't just want to get rid of this reference to the `List` but you want to ensure all of the _other_ references point to an empty list (as you identified in your question). To be fair, that is rare. – mjwills May 24 '18 at 02:01
  • `wouldn't it be better off to just make sure we don't hold multiple references to this list at once and how would we do that (explicitly)` Other than by scoping (i.e. having the list as function scoped or `private`, and never exposing it externally) there is no easy way to do this. – mjwills May 24 '18 at 02:02
  • @mjwills In this case, it is private and only used inside the class. – DereckM May 24 '18 at 02:04
  • In that case, they are equivalent. – mjwills May 24 '18 at 02:47
  • Possible duplicate of [Clearing a private collection or setting it to null?](https://stackoverflow.com/questions/11717416/clearing-a-private-collection-or-setting-it-to-null) – mjwills May 24 '18 at 03:04
  • 1
    That call to `List.Clear()` immediately before nulling out the variable holding the list is useless _if that's the only reference to that list_. Assuming that's the case either the author didn't know that, or, possibly he both a) confused `List.Clear` with other `Clear` methods that are synonyms (in their classes) for `Dispose` (e.g., `Stream.Dispose`) and b) thought therefore that `List.Clear` was a kind of "deep `Dispose`" that would individually dispose each of the elements of the list. – davidbak May 24 '18 at 04:46
  • @usr - well a bug is a bug and holding on to objects too long because they were not properly erased from a collection you're _keeping_ is a classic leak. But in this case the collection is specifically _not_ being kept. – davidbak May 24 '18 at 12:23

5 Answers5

8

The list.Clear() is not necessary in your scenario (where the List is private and only used within the class).

A great intro level link on reachability / live objects is http://levibotelho.com/development/how-does-the-garbage-collector-work :

How does the garbage collector identify garbage?

In Microsoft’s implementation of the .NET framework the garbage collector determines if an object is garbage by examining the reference type variables pointing to it. In the context of the garbage collector, reference type variables are known as “roots”. Examples of roots include:

  • A reference on the stack
  • A reference in a static variable
  • A reference in another object on the managed heap that is not eligible for garbage collection
  • A reference in the form of a local variable in a method

The key bit in this context is A reference in another object on the managed heap that is not eligible for garbage collection. Thus, if the List is eligible to be collected (and the objects within the list aren't referenced elsewhere) then those objects in the List are also eligible to be collected.

In other words, the GC will realise that list and its contents are unreachable in the same pass.

So, is there an instance where list.Clear() would be useful? Yes. It might be useful if you have two references to a single List (e.g. as two fields in two different objects). One of those references may wish to clear the list in a way that the other reference is also impacted - in which list.Clear() is perfect.

mjwills
  • 23,389
  • 6
  • 40
  • 63
4

This answer started as a comment for Mick, who claims that:

It depends on which version of .NET you are working with. On mobile platforms like Xamarin or mono, you may find that the garbage collector needs this kind of help in order to do its work.

That statement is begging to be fact checked. So, let us see...


.NET

.NET uses a generational mark and sweep garbage collector. You can see the abstract of the algorithm in What happens during a garbage collection . For summary, it goes over the object graph, and if it cannot reach a object, that one can be erased.

Thus, the garbage collector will correctly identify the items of the list as collectible in the same iteration, regardless of whatever or not you clear the list. There is no need to decouple the objects beforehand.

This means that clearing the list does not help the garbage collector on the regular implementation of .NET.

Note: If there were another reference to the list, then the fact that you cleared the list would be visible.


Mono and Xamarin

Mono

As it turns out, the same is true for Mono.

Xamarin.Android

Also true for Xamarin.Android.

Xamarin.iOS

However, Xamarin.iOS requires additional considerations. In particular, MonoTouch will use wrapped Objective-C objects which are beyond the garbage collector. See Avoid strong circular references under iOS Performance. These objects require different semantics.

Xamarin.iOS will minimize the use of Objetive-C objects by keeping a cache:

C# NSObjects are also created on demand when you invoke a method or a property that returns an NSObject. At this point, the runtime will look into an object cache and determine whether a given Objective-C NSObject has already been surfaced to the managed world or not. If the object has been surfaced, the existing object will be returned, otherwise a constructor that takes an IntPtr as a parameter is invoked to construct the object.

The system keeps these objects alive even there are no references from managed code:

User-subclasses of NSObjects often contain C# state so whenever the Objective-C runtime performs a "retain" operation on one of these objects, the runtime creates a GCHandle that keeps the managed object alive, even if there are no C# visible references to the object. This simplifies bookeeping a lot, since the state will be preserved automatically for you.

Emphasis mine.

Thus, under Xamarin.iOS, if there were a chance that the list might contain wrapped Objetive-C objects, this code would help the garbage collector.

See the question How does memory management works on Xamarin.IOS, Miguel de Icaza explains in his answer that the semantics are to "retain" the object when you take a reference and "release" it when the reference is null.

On the Objetive-C side, "release" does not mean to destroy the object. Objetive-C uses a reference count garbage collector. When we "retain" the object the counter is incremented and when we "release" the counter is decreased. The system destroys the object when the counter reaches zero. See: About Memory Management.

Therefore, Objetive-C is bad at handling circular references (if A references B and B references A, their reference count is not zero, even if they cannot be reached), thus, you should avoid them in Xamarin.iOS. In fact, forgetting to decouple references will lead to leaks in Xamarin.iOS... See: Xamarin iOS memory leaks everywhere.


Others

dotGNU also uses a generational mark and sweep garbage collector.

I also had a look at CrossNet (that compiles IL to C++), it appears they attempted to implement it too. I do not know how good it is.

Theraot
  • 31,890
  • 5
  • 57
  • 86
  • Nice anaylsis. I didn't intend to suggest any particular implementation of the CLI was more susceptible to memory leaks, which is the reason why I used the word 'may' a lot. Only that the robustness of the garbage collector wasn't guaranteed from one implementation to another. And of course from one version to the next on these platforms is another story again. Xamarin and Mono were just examples of platforms that MIGHT be more susceptible to memory leaks. I'd bet that .NET on Raspberry Pi might be another candidate. – Mick May 28 '18 at 02:38
  • @Mick both .NET Compact Framework (ARM) and .NET Micro Framework (Netduino) use mark and sweep. They actually have an easier time because they do not allow you to handle pointers, and thus do not need to pin objects. However performance is an issue. I do not know if Raspberry did ever run with one of those, however, it can run Mono and .NET Core, which are mark and sweep. Leaks can happen if you forget to dispose native wrappers, regardless of GC. As for versions, all these GC are based on Boehm GC, which original release predates .NET, most platforms have moved to concurrent variants. – Theraot May 28 '18 at 04:06
  • Yes well, I don't claim to have detailed knowledge of any implementation of the GC. However I have had experience with them. It's been a while since I used the .NET Compact Framework, ten years ago I definitely encountered memory issues with an app running on .NET Compact Framework which were definitely resolved by decoupling. I would imagine the compact framework GC has had significant upgrades since then. The processing power of the devices running the compact framework have definitely shot up, which would allow the GC to be more aggressive in it's CPU usage. – Mick May 28 '18 at 04:35
  • @Mick ten years ago is when most people used that thing - building for it was a pain, but building for it in anything else than Visual Studio 2008 was - as far as I can tell - not possible. I certainly will not be installing it to check how bad it was. If I recall correctly, the GC was slow, it could take too long to start to run, and garbage would creep. And when it did run, it would halt your program. I think `GC.Collect` on predefined locations was common practice, both to keep down the garbage, and to have control on the halts. No... let us pretend nobody uses it, and forget about it. – Theraot May 28 '18 at 04:51
2

It depends on which version of .NET you are working with. On mobile platforms like Xamarin or mono, you may find that the garbage collector needs this kind of help in order to do its work. Whereas on desktop platforms the garbage collector implementation may be more elaborate. Each implementation of the CLI out there is going to have it's own implementation of the garbage collector and it is likely to behave differently from one implementation to another.

I can remember 10 years ago working on a Windows Mobile application which had memory issues and this sort of code was the solution. This was probably due to the mobile platform requiring a garbage collector that was more frugal with processing power than the desktop.

Decoupling objects helps simplify the analysis the garbage collector needs to do and helps avoid scenarios where the garbage collector fails to recognise a large graph of objects has actually become disconnected from all the threads in your application. Which results in memory leaks.

Anyone who believes you can't have memory leaks in .NET is an inexperienced .NET developer. On desktop platforms just ensuring Dispose is called on objects which implement them may be enough, however with other implementations you may find it is not.

List.Clear() will decouple the objects in the list from the list and each other.

EDIT: So to be clear I'm not claiming that any particular implementation currently out there is susceptible to memory leaks. And again depending on when this answer is read the robustness of the garbage collector on any implementation of the CLI currently out there could have changed since the time writing this.

Essentially I'm suggesting if you know that your code needs to be cross platform and used across many implementations of the .NET framework, especially implementations of the .NET framework for mobile devices, it could be worth investing time into decoupling objects when they are no longer required. In that case I'd start off by adding decoupling to classes that already implement Dispose, and then if needed look at implementing IDisposable on classes that don't implement IDisposable and ensuring Dispose is called on those classes.

How to tell for sure if it's needed? You need to instrument and monitor the memory usage of your application on each platform it is to be deployed on. Rather than writing lots of superfluous code, I think the best approach is to wait until your monitoring tools indicate you have memory leaks.

Mick
  • 6,527
  • 4
  • 52
  • 67
  • `List.Clear()` will _not_ decouple the objects in the list _from each other_. It's not some kind of "deep clear". It'll only decouple the elements from the list structure itself. – davidbak May 24 '18 at 04:37
  • @davidbak whilst in the list as far as the garbage collector is concerned they are in the same graph of objects, and hence coupled together indirectly via the list, which was the meaning of my original statement. True if the members of the list have references to each other then, no List.Clear() will no decouple these references. – Mick May 24 '18 at 05:21
1

As mentioned in the docs:

List.Clear Method (): Count is set to 0, and references to other objects from elements of the collection are also released.

In your 1st snippet:

if(list != null){
    list.Clear();
    list = null;
}

If you just set the list to null, it means that you release the reference of your list to the actual object in the memory (so the list itself is remain in the memory) and waiting for the Garbage Collector comes and release its allocated memory.

But the problem is that your list may contain elements that hold a reference to another objects, for example:

list → objectA, objectB, objectC
objectB → objectB1, objectB2

So, after setting the list to null, now list has no reference and it should be collected by Garbage Collector later, but objectB1 and objectB2 has a reference from objectB (still be in the memory) and because of that, Garbage Collector need to analyse the object reference chain. To make it less confusing, this snippet use .Clear() function to remove this confusion.

Duc Filan
  • 6,769
  • 3
  • 21
  • 26
  • So if I understand correctly, this is done to prevent slow garbage collection in case where objects hold reference of objects (and it could go on objectB1 -> objectB3 -> objectB4) because the garbage collector goes through iterations of collection and each iteration it has to reevaluate the items that are not held reference to anymore (instead of say, collecting all Objects and their references to other objects in one single cycle)? – DereckM May 24 '18 at 02:37
  • 1
    It seems like you are implying that if `ObjectA` holds a reference to `ObjectB` and `ObjectB` has a reference to `ObjectC` then one pass of the GC will realise that `ObjectA` is unreachable and clean it up. Then the next pass will realise that `ObjectB` is unreachable and clean it up. And then the next pass will realise that `ObjectC` is unreachable and clean it up. Is that your understanding? If so, do you have any links about that? I was certainly of the understanding that if the GC determined that `ObjectA` was unreachable it would determine the other two were unreachable as well. – mjwills May 24 '18 at 02:51
  • 3
    Mark-sweep, copying, or generational garbage collectors don't work that way. If the list L contains elements A..Z and those are the _only_ pointers to A..Z and A..Z contain pointers to AAA..ZZZ and those AAA..ZZZ are only pointed to by A..Z and each other ... then after nulling the only pointer to list L tracing collectors (mark/sweep) won't trace A..Z or AAA..ZZZ and copying collectors won't copy them. _Refcounting_ GC won't collect the elements that are pointing to each other until later they finally do a "major" (mark/sweep or copying) collection. But .NET doesn't do refcounting AFAIK.. – davidbak May 24 '18 at 03:18
-1

Clearing the list ensures that if the list is not garbage collected for some reason, then at the very least, the elements it contained can still be disposed of.

As stated in the comments, preventing other references to the list from existing requires careful planning, and clearing the list before nulling it doesn't incur a big enough performance hit to justify trying to avoid doing so.

laptou
  • 6,389
  • 2
  • 28
  • 59