12

It is well known in .NET that types are not garbage collected, which means that if you're playing around with f.ex. Reflection.Emit, you have to be careful to unload AppDomains and so on... At least that's how I used to understand how things work.

That made me wonder if generic types are garbage collected, to be more precise: generics created with MakeGenericType, let's say... for example based on user input. :-)

So I constructed the following test case:

public interface IRecursiveClass
{
    int Calculate();
}

public class RecursiveClass1<T> : IRecursiveClass 
                                  where T : IRecursiveClass,new()
{
    public int Calculate()
    {
        return new T().Calculate() + 1;
    }
}
public class RecursiveClass2<T> : IRecursiveClass
                                  where T : IRecursiveClass,new()
{
    public int Calculate()
    {
        return new T().Calculate() + 2;
    }
}

public class TailClass : IRecursiveClass
{
    public int Calculate()
    {
        return 0;
    }
}

class RecursiveGenericsTest
{
    public static int CalculateFromUserInput(string str)
    {
        Type tail = typeof(TailClass);
        foreach (char c in str)
        {
            if (c == 0)
            {
                tail = typeof(RecursiveClass1<>).MakeGenericType(tail);
            }
            else
            {
                tail = typeof(RecursiveClass2<>).MakeGenericType(tail);
            }
        }
        IRecursiveClass cl = (IRecursiveClass)Activator.CreateInstance(tail);
        return cl.Calculate();
    }

    static long MemoryUsage
    {
        get
        {
            GC.Collect(GC.MaxGeneration);
            GC.WaitForFullGCComplete();
            return GC.GetTotalMemory(true);
        }
    }

    static void Main(string[] args)
    {
        long start = MemoryUsage;

        int total = 0;
        for (int i = 0; i < 1000000; ++i)
        {
            StringBuilder sb = new StringBuilder();
            int j = i;
            for (int k = 0; k < 20; ++k) // fix the recursion depth
            {
                if ((j & 1) == 1)
                {
                    sb.Append('1');
                }
                else
                {
                    sb.Append('0');
                }
                j >>= 1;
            }

            total += CalculateFromUserInput(sb.ToString());

            if ((i % 10000) == 0)
            {
                Console.WriteLine("Current memory usage @ {0}: {1}", 
                                  i, MemoryUsage - start);
            }
        }

        Console.WriteLine("Done and the total is {0}", total);
        Console.WriteLine("Current memory usage: {0}", MemoryUsage - start);

        Console.ReadLine();
    }
}

As you can see, the generic types are defined 'possibly recursive', with a 'tail' class that marks the end of the recursion. And to ensure that GC.TotalMemoryUsage isn't cheating, I also opened Task Manager.

So far so good. Next thing I did was fire this beast up and while I was waiting for an 'Out of memory' ... I noticed that it was - contrary to my expectations - not consuming more memory over time. In fact, it shows a slight drop in memory consumption in time.

Can someone please explain this? Are generic types actually collected by the GC? And if so... are there also Reflection.Emit cases that are garbage collected?

atlaste
  • 30,418
  • 3
  • 57
  • 87
  • You could experiment by creating a `WeakReference` to your created generic type, then check if the `Target` is null after running a GC pass. – Sam Harwell Apr 18 '13 at 15:54
  • Maybe this will help [How do generics get compiled by the JIT compiler?](http://stackoverflow.com/questions/5342345/how-do-generics-get-compiled-by-the-jit-compiler) – Brent Stewart Apr 18 '13 at 15:57

2 Answers2

20

To answer your first question:

Generic constructions of types are not collected.

However, if you construct C<string> and C<object>, the CLR actually generates the code for the methods only once; since reference to string and reference to object are guaranteed to be the same size, it can do so safely. It's pretty clever. If you construct C<int> and C<double> though, the code for the methods gets generated twice, once for each construction. (Assuming that the code for the methods is generated at all of course; methods are jitted on demand; that's why its called jitting.)

To demonstrate that generic types are not collected, instead create a generic type

class C<T> { public static readonly T Big = new T[10000]; }

C<object> and C<string> share any code generated for the methods, but each one gets its own static fields, and those fields will live forever. The more types you construct, the more memory will be filled up with those big arrays.

And now you know why those types cannot be collected; we have no way of knowing if someone is going to try to access a member of one of those arrays at any time in the future. Since we don't know when the last array access is going to be, they have to live forever, and therefore the type that contains it has to live forever too.


To answer your second question: Is there a way to make dynamically-emitted assemblies that are collected?

Yes. The documentation is here:

http://msdn.microsoft.com/en-us/library/dd554932.aspx

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • @ErikLippert Wow, that's a great answer Eric, thanks! I figured the thing about the static's, which is why I didn't add a static in the classes. I'm still thinking about the line 'actually generates the code for the methods only once [..]'. Does that also mean that if you have two reference-type generic classes without static's, internally only one 'type' exists in memory? (e.g. since a type is just code and no memory it will only be created once) But that also implies a JIT optimization boundary (inlining will be impossible if T can be either Foo and Bar with only 1 instance of the code)? – atlaste Apr 18 '13 at 19:15
  • If one copy of the code for a method like `string GetTypeName(T param) {return typeof(T).ToString();}` will be shared for all class types `T`, how does the code know what type it's called with? Is that somehow passed as a hidden parameter? – supercat Apr 18 '13 at 21:25
  • @supercat: There is some mechanism whereby the construction that is currently executing is secretly passed around, but I don't know what it is. You'd have to ask an expert on the jitter. – Eric Lippert Apr 18 '13 at 21:35
  • 1
    @supercat Yes, a hidden argument is used. For more details, see [*CLR Generics and code sharing*](http://blogs.msdn.com/b/joelpob/archive/2004/11/17/259224.aspx) (though it's from 2004, so the details could have changed since then). – svick Apr 18 '13 at 23:02
  • @svick: Suppose one had a method `static string GetTypeName()`, and one were to repeatedly assign both `GetTypeName` and `GetTypeName` to delegates. Would the system create one `MethodDesc` the first time any given type was used with that method and then use it for all delegates associated with that type? How would the system keep track of such things? – supercat Apr 18 '13 at 23:11
  • I don't like the magic so far... :-) Thanks @svinck . I've been reading up and I'm confused. Sure, if you know before-hand that something is a value-type or a reference-type, you get a small performance gain - but the way I understand it now the real gain (method inlining by the JIT) is simply impossible due to code sharing. After all, you cannot inline both `Foo.CompareTo` and `string.CompareTo` in your fancy Sort, because the code is shared. I did notice a loophole that it might be possible with value types to do this though - since they are treating them differently? – atlaste Apr 19 '13 at 06:23
  • @EricLippert maybe you can shed some light on this... As a follow-up experiment, I've changed every 'class' to 'struct' and made T a field in the struct. What I observe is that the behavior is the same, as long as the size of the (total) struct is the same. E.g. that means the code for `Foo` and `Foo` is shared by the same rules as `Foo` and `Foo` (same size) - with no difference between value-type and reference-type (contrary to the link posted by svink). Correct? – atlaste Apr 19 '13 at 07:07
  • @StefandeBruijn: I love *magic*! How can we live without something magic? – Ken Kin Apr 19 '13 at 12:10
  • @KenKin Well numbers don't lie; still, I'm still hoping there's some way to cheating C# into inlining generics without having to resort to Emit and doing it myself... Parts of my code do several billion calculations (with algorithms depending on user input) - and user experience matters unfortunately. That's why I want to be absolutely sure about what I can expect. (and that's why I don't like this particular kind of magic :-)) – atlaste Apr 19 '13 at 12:58
  • @StefandeBruijn: I'm not sure that I understood about *cheating C# into inlining generics*. What does *cheating* mean? – Ken Kin Apr 19 '13 at 13:24
  • @KenKin Well, in a nutshell... if you do template metaprogramming in C++, your templates will first be expanded and are then optimized by the compiler. Yes that will result in large executables, but if done correctly, it'll also result in much faster code. I want to trick C# in doing the same at runtime: so expand the generics and then optimizing the lot. In my question code that would result in code like `return [constant];`, with the constant depending on the class tree. I know that the JITter supports inlining under certain conditions; what I'm looking for is a way to inline at runtime. – atlaste Apr 19 '13 at 13:50
  • @StefandeBruijn: Given something like `struct SomeStruct { public T it; public override String ToString() {return it.ToString();} }` I can't see how `SomeStruct` could use the same X86 code as `SomeStruct`, since `SomeStruct.ToString()` needs to pass the *contents* of field `it`, while `SomeStruct.ToString()` needs to pass the *address* of that field. That's no problem in CIL, but X86 code would have to be generated to do one or the other. – supercat Apr 19 '13 at 14:52
  • @supercat Yes, but if you have two value types with equal size, that doesn't hold and the same x86 code can (and apparently is) reused... After all, parameters are just simple bytes on the stack (as are pointers) and it's the method that you use that knows how to handle those bytes... but the consequence is that you always have to call a method, so inlining is impossible. (At least that's my deduction). The more I think about this 'magic', the more I feel this is actually a workaround to make generics 'easy'- after all, optimizing the composition can be quite difficult as we've seen in C++. – atlaste Apr 19 '13 at 15:34
  • 1
    @StefandeBruijn: Compiled for x86, a `SomeStruct` and a `SomeStruct` are both value types, and they both consume four bytes. Code which has a variable of type `T` must recognize `T` as containing a GC root if it's a `SomeStruct`, but not if it's a `SomeStruct`. Maybe the GC can handle that even when the code is shared, but it would seem like the GC metadata would be simpler if they used different code. – supercat Apr 19 '13 at 16:00
  • @supercat Correct, that is why a reference type and a value type probably aren't compiled as the same code. But `SomeStruct` and `SomeStruct` don't have that problem, so the code can be shared - and the same holds for `SomeStruct>` and `SomeStruct>`. At least, that's how I understand how it works. – atlaste Apr 19 '13 at 18:41
  • @StefandeBruijn: By my understanding, two value types can share code if they are the same size, and contain within them the same number of reference types, located in the same paces; I don't think it matters whether the reference types are wrapped in two layers of nested structures or twenty, since the only way of accessing inner layers generically would be through constrained virtual function calls (which can be dispatched to different destinations depending upon the inner type). – supercat Apr 19 '13 at 19:14
  • @supercat Then we understand it the same way. And because of this: if you have `Foo where T:Bar` and `Bar` is a class (or interface), it also means that: a non-abstract method call of Bar can potentially be inlined, while a method call in T can never be inlined - even though you the type might be known at compile time. Reason: the compiler shares the code so it cannot assume what's going to happen; it needs the function call. (Whereas: if the code was not shared it *could* optimize it) – atlaste Apr 19 '13 at 19:37
  • @StefandeBruijn regarding to GC roots, GC can work its way with extra help of magic values present on somewhere on stack TLS or something. even that, there could be code sharing to some extent. Joe Duffy mentioned currently there is no code sharing in .net for value type generics but it can change in future. http://joeduffyblog.com/2011/10/23/on-generics-and-some-of-the-associated-overheads/ – TakeMeAsAGuest Mar 04 '19 at 19:28
0

Irrelevant to code sharing or code not sharing, each MakeGenericType attempt will create new internal CLR classes for metada which will consume memory. Type objects are created directly in CLR code (not in managed code), there exists only one instance per Type object, so you can compare them for reference equality. CLR itself holds a reference to it so they cant be GC'ed but in my tests i confirmed GC can move them.

Edit: Reference hold by CLR could be weak reference so after digging RuntimeTypeHandle.cs source i see

internal bool IsCollectible()
{
    return RuntimeTypeHandle.IsCollectible(GetTypeHandleInternal());
}

which is most probably false, considering Eric Lippert

TakeMeAsAGuest
  • 957
  • 6
  • 11