57

I've read some information about generics in .ΝΕΤ and noticed one interesting thing.

For example, if I have a generic class:

class Foo<T> 
{ 
    public static int Counter; 
}

Console.WriteLine(++Foo<int>.Counter); //1
Console.WriteLine(++Foo<string>.Counter); //1

Two classes Foo<int> and Foo<string> are different at runtime. But what about case when non-generic class having generic method?

class Foo 
{
    public void Bar<T>()
    {
    }
}

It's obvious that there's only one Foo class. But what about method Bar? All the generic classes and methods are closed at runtime with parameters they used with. Does it mean that class Foo has many implementations of Bar and where the information about this method stored in memory?

Theodoros Chatzigiannakis
  • 28,773
  • 8
  • 68
  • 104
roman
  • 515
  • 4
  • 7
  • What do you exactly mean by "information about this method"? – Yacoub Massad Jan 04 '17 at 10:32
  • 3
    Generics is a compiler thing. In reality you will get multiple classes/methods. – Sinatr Jan 04 '17 at 10:32
  • 5
    @Sinatr, I don't think this is the case. Generic classes/methods are compiled to IL as generic. At runtime (via reflection), you can choose whatever type parameters. – Yacoub Massad Jan 04 '17 at 10:34
  • 16
    @Sinatr: Not really. The compiler will only emit a single class/method. The runtime will allocate one set of static fields per constructed type; as for methods, the JIT compiler can do whatever it sees fit. I believe that when there are different type arguments which are *reference* types, they can share code - but not when they're value types. – Jon Skeet Jan 04 '17 at 10:34
  • @JonSkeet, "Generics is a compiler thing. In reality, when your code runs, you will have multiple classes/methods" - is this better or still wrong? I am not into IL/JIT details, only trying to explain what effect you should expect when using them. – Sinatr Jan 04 '17 at 10:43
  • 4
    @Sinatr: No, it's not really better, to be honest. If it were just "a compiler thing" you wouldn't expect to see it in the compiler *output*. It's not clear what you're trying to say - and I suspect that if you clarified what you were trying to say further, at least part of it may be wrong. – Jon Skeet Jan 04 '17 at 10:45
  • @Sinatr - I afraid you talking about overload methods – Fabio Jan 04 '17 at 10:45
  • @JonSkeet, I am just trying to say what `Foo` and `Foo` will be 2 different classes (with different static variables, therefore `Counter` effect) and `Bar()` and `Bar()` are 2 different methods or *same* class, so e.g. using `static` member inside either `Bar()` will reference same member. But I guess if you have lambda closure inside such method, then all `Bar()` share same local variable value, is this right? Is this is true, then my explanation is indeed bad. – Sinatr Jan 04 '17 at 10:51
  • @Sinatr: Well if they capture local variables, each *invocation* of the method will get a separate delegate... – Jon Skeet Jan 04 '17 at 10:52
  • @JonSkeet If for example I have a value with the type of a generic argument `T` and I use `Console.WriteLine(object)` overload to print it, I see this in the IL code: `box !!T` followed by `call void [mscorlib]System.Console::WriteLine(object)`. Apparently the `box` IL instruction is used even if `T` could be (or is known to be) just a reference type. Then the jitter will probably do nothing (the value on the stack before `box` will already be a reference) when it processes the `box` instruction and `T` is a reference type. For a value type `T`, of course the run-time must really make a box. – Jeppe Stig Nielsen Jan 04 '17 at 10:52

3 Answers3

54

As opposed to C++ templates, .NET generics are evaluated in runtime, not at compile-time. Semantically, if you instantiate the generic class with different type parameters, those will behave as if it were two different classes, but under the hood, there is only one class in the compiled IL (intermediate language) code.

Generic types

The difference between different instantiatons of the same generic type becomes apparent when you use Reflection: typeof(YourClass<int>) will not be the same as typeof(YourClass<string>). These are called constructed generic types. There also exists a typeof(YourClass<>) which represents the generic type definition. Here are some further tips on dealing with generics via Reflection.

When you instantiate a constructed generic class, the runtime generates a specialized class on the fly. There are subtle differences between how it works with value and reference types.

  • The compiler will only generate a single generic type into the assembly.
  • The runtime creates a separate version of your generic class for each value type you use it with.
  • The runtime allocates a separate set of static fields for each type parameter of the generic class.
  • Because reference types have the same size, the runtime can reuse the specialized version it generated the first time you used it with a reference type.

Generic methods

For generic methods, the principles are the same.

  • The compiler only generates one generic method, which is the generic method definition.
  • In runtime, each different specialization of the method is treated as a different method of the same class.
Venemo
  • 18,515
  • 13
  • 84
  • 125
  • The linked article seems to contradict your answer. – Owen Pauling Jan 04 '17 at 10:39
  • @OwenPauling I edited the answer to clarify. What's the contradiction? – Venemo Jan 04 '17 at 10:47
  • your answer previously stated that "under the hood, there is only one [class]", whereas the article states " the runtime generates a specialized version of the Stack class". – Owen Pauling Jan 04 '17 at 10:49
  • 3
    @OwenPauling Ah, okay. I meant that *there is only one class in the compiled IL code*. I hope it's clear now. :) – Venemo Jan 04 '17 at 10:51
  • Yes thank you. I actually interpreted the article differently as the runtime creating additional specialized versions for each type. But now if I understand correctly it is actually modifying the one class, based on your answer + Jon Skeet's comments. – Owen Pauling Jan 04 '17 at 10:55
  • 1
    Thanks a lot for your answer! It's much more clear for me now. – roman Jan 04 '17 at 17:01
32

First off, let's clarify two things. This is a generic method definition:

T M<T>(T x) 
{
    return x;
}

This is a generic type definition:

class C<T>
{
}

Most likely, if I ask you what M is, you'll say that it's a generic method that takes a T and returns a T. That's absolutely correct, but I propose a different way of thinking about it -- there are two sets of parameters here. One is the type T, the other is the object x. If we combine them, we know that collectively this method takes two parameters in total.


The concept of currying tells us that a function that takes two parameters can be transformed to a function that takes one parameter and returns another function that takes the other parameter (and vice versa). For example, here's a function that takes two integers and produces their sum:

Func<int, int, int> uncurry = (x, y) => x + y;
int sum = uncurry(1, 3);

And here's an equivalent form, where we have a function that takes one integer and produces a function that takes another integer and returns the sum of those aforementioned integers:

Func<int, Func<int, int>> curry = x => y => x + y;
int sum = curry(1)(3);

We went from having one function that takes two integers to having a function that takes an integer and creates functions. Obviously, these two aren't literally the same thing in C#, but they are two different ways of saying the same thing, because passing the same information will eventually get you to the same final result.

Currying allows us to reason about functions easier (it's easier to reason about one parameter than two) and it allow us to know that our conclusions are still relevant for any number of parameters.


Consider for a moment that, on an abstract level, this is what takes place here. Let's say M is a "super-function" that takes a type T and returns a regular method. That returned method takes a T value and returns a T value.

For example, if we call the super-function M with the argument int, we get a regular method from int to int:

Func<int, int> e = M<int>;

And if we call that regular method with the argument 5, we get a 5 back, as we expected:

int v = e(5);

So, consider the following expression:

int v = M<int>(5);

Do you see now why this could be considered as two separate calls? You can recognize the call to the super-function because its arguments are passed in <>. Then the call to the returned method follows, where the arguments are passed in (). It's analogous to the previous example:

curry(1)(3);

And similarly, a generic type definition is also a super-function that takes a type and returns another type. For example, List<int> is a call to the super-function List with an argument int that returns a type that's a list of integers.

Now when the C# compiler meets a regular method, it compiles it as a regular method. It doesn't attempt to create different definitions for different possible arguments. So, this:

int Square(int x) => x * x;

gets compiled as it is. It does not get compiled as:

int Square__0() => 0;
int Square__1() => 1;
int Square__2() => 4;
// and so on

In other words, the C# compiler does not evaluate all possible arguments for this method in order to embed them into the final exacutable -- rather, it leaves the method in its parameterized form and trusts that the result will be evaluated at runtime.

Similarly, when the C# compiler meets a super-function (a generic method or type definition), it compiles it as a super-function. It doesn't attempt to create different definitions for different possible arguments. So, this:

T M<T>(T x) => x;

gets compiled as it is. It does not get compiled as:

int M(int x) => x;
int[] M(int[] x) => x;
int[][] M(int[][] x) => x;
// and so on
float M(float x) => x;
float[] M(float[] x) => x;
float[][] M(float[][] x) => x;
// and so on

Again, the C# compiler trusts that when this super-function is called, it will be evaluated at runtime, and the regular method or type will be produced by that evaluation.

This is one of the reasons why C# is benefitted from having a JIT-compiler as part of its runtime. When a super-function is evaluated, it produces a brand new method or a type that wasn't there at compile time! We call that process reification. Subsequently, the runtime remembers that result so it won't have to re-create it again. That part is called memoization.

Compare with C++ which doesn't require a JIT-compiler as part of its runtime. The C++ compiler actually needs to evaluate the super-functions (called "templates") at compile time. That's a feasible option because the arguments of the super-functions are restricted to things that can be evaluated at compile time.


So, to answer your question:

class Foo 
{
    public void Bar()
    {
    }
}

Foo is a regular type and there's only one of it. Bar is a regular method inside Foo and there's only one of it.

class Foo<T>
{
    public void Bar()
    {
    }
}

Foo<T> is a super-function that creates types at runtime. Each one of those resulting types has its own regular method named Bar and there's only one of it (for each type).

class Foo
{
    public void Bar<T>()
    {
    }
}

Foo is a regular type and there's only one of it. Bar<T> is a super-function that creates regular methods at runtime. Each one of those resulting methods will then be considered part of the regular type Foo.

class Foo<Τ1>
{
    public void Bar<T2>()
    {
    }
}

Foo<T1> is a super-function that creates types at runtime. Each one of those resulting types has its own a super-function named Bar<T2> that creates regular methods at runtime (at a later time). Each one of those resulting methods is considered part of the type that created the corresponding super-function.


The above is the conceptual explanation. Beyond it, certain optimizations can be implemented to reduce the number of distinct implementations in memory -- e.g. two constructed methods can share a single machine-code implementation under certain circumstances. See Luaan's answer about why the CLR can do this and when it actually does it.

Community
  • 1
  • 1
Theodoros Chatzigiannakis
  • 28,773
  • 8
  • 68
  • 104
18

In IL itself, there's just one "copy" of the code, just like in C#. Generics are fully supported by IL, and the C# compiler doesn't need to do any tricks. You will find that each reification of a generic type (e.g. List<int>) has a separate type, but they still keep a reference to the original open generic type (e.g. List<>); however, at the same time, as per contract, they must behave as if there were separate methods or types for each closed generic. So the simplest solution is indeed to have each closed generic method be a separate method.

Now for the implementation details :) In practice, this is rarely necessary, and can be expensive. So what actually happens is that if a single method can handle multiple type arguments, it will. This means that all reference types can use the same method (the type safety is already determined at compile-time, so there's no need to have it again in runtime), and with a little trickery with static fields, you can use the same "type" as well. For example:

class Foo<T>
{
  private static int Counter;

  public static int DoCount() => Counter++;
  public static bool IsOk() => true;
}

Foo<string>.DoCount(); // 0
Foo<string>.DoCount(); // 1
Foo<object>.DoCount(); // 0

There's only one assembly "method" for IsOk, and it can be used by both Foo<string> and Foo<object> (which of course also means that calls to that method can be the same). But their static fields are still separate, as required by the CLI specification, which also means that DoCount must refer to two separate fields for Foo<string> and Foo<object>. And yet, when I do the disassembly (on my computer, mind you - these are implementation details and may vary quite a bit; also, it takes a bit of effort to prevent the inlining of DoCount), there's only one DoCount method. How? The "reference" to Counter is indirect:

000007FE940D048E  mov         rcx, 7FE93FC5C18h  ; Foo<string>
000007FE940D0498  call        000007FE940D00C8   ; Foo<>.DoCount()
000007FE940D049D  mov         rcx, 7FE93FC5C18h  ; Foo<string>
000007FE940D04A7  call        000007FE940D00C8   ; Foo<>.DoCount()
000007FE940D04AC  mov         rcx, 7FE93FC5D28h  ; Foo<object>
000007FE940D04B6  call        000007FE940D00C8   ; Foo<>.DoCount()

And the DoCount method looks something like this (excluding the prolog and "I don't want to inline this method" filler):

000007FE940D0514  mov         rcx,rsi                ; RCX was stored in RSI in the prolog
000007FE940D0517  call        000007FEF3BC9050       ; Load Foo<actual> address
000007FE940D051C  mov         edx,dword ptr [rax+8]  ; EDX = Foo<actual>.Counter
000007FE940D051F  lea         ecx,[rdx+1]            ; ECX = RDX + 1
000007FE940D0522  mov         dword ptr [rax+8],ecx  ; Foo<actual>.Counter = ECX
000007FE940D0525  mov         eax,edx  
000007FE940D0527  add         rsp,30h  
000007FE940D052B  pop         rsi  
000007FE940D052C  ret  

So the code basically "injected" the Foo<string>/Foo<object> dependency, so while the calls are different, the method being called is actually the same - just with a bit more indirection. Of course, for our original method (() => Counter++), this will not be a call at all, and will not have the extra indirection - it will just inline in the callsite.

It's a bit trickier for value types. Fields of reference types are always the same size - the size of the reference. On the other hand, fields of value types may have different sizes e.g. int vs. long or decimal. Indexing an array of integers requires different assembly than indexing an array of decimals. And since structs can be generic too, the size of the struct may depend on the size of the type arguments:

struct Container<T>
{
  public T Value;
}

default(Container<double>); // Can be as small as 8 bytes
default(Container<decimal>); // Can never be smaller than 16 bytes

If we add value types to our earlier example

Foo<int>.DoCount();
Foo<double>.DoCount();
Foo<int>.DoCount();

We get this code:

000007FE940D04BB  call        000007FE940D00F0  ; Foo<int>.DoCount()
000007FE940D04C0  call        000007FE940D0118  ; Foo<double>.DoCount()
000007FE940D04C5  call        000007FE940D00F0  ; Foo<int>.DoCount()

As you can see, while we don't get the extra indirection for the static fields unlike with the reference types, each method is actually entirely separate. The code in the method is shorter (and faster), but cannot be reused (this is for Foo<int>.DoCount():

000007FE940D058B  mov         eax,dword ptr [000007FE93FC60D0h]  ; Foo<int>.Counter
000007FE940D0594  lea         edx,[rax+1]
000007FE940D0597  mov         dword ptr [7FE93FC60D0h],edx  

Just a plain static field access as if the type wasn't generic at all - as if we just defined class FooOfInt and class FooOfDouble.

Most of the time, this isn't really important for you. Well-designed generics usually more than pay for their costs, and you can't just make a flat statement about the performance of generics. Using a List<int> will almost always be a better idea than using ArrayList of ints - you pay the extra memory cost of having multiple List<> methods, but unless you have many different value-type List<>s with no items, the savings will likely well outweigh the cost in both memory and time. If you only have one reification of a given generic type (or all the reifications are closed on reference types), you usually aren't going to pay extra - there may be a bit of extra indirection if inlining isn't possible.

There's a few guidelines to using generics efficiently. The most relevant here is to only keep the actually generic parts generic. As soon as the containing type is generic, everything inside may also be generic - so if you have 100 kiB of static fields in a generic type, every reification will need to duplicate that. This may be what you want, but it might be a mistake. The usual aproach is to put the non-generic parts in a non-generic static class. The same applies to nested classes - class Foo<T> { class Bar { } } means that Bar is also a generic class (it "inherits" the type argument of its containing class).

On my computer, even if I keep the DoCount method free of anything generic (replace Counter++ with just 42), the code is still the same - the compilers don't try to eliminate unnecessary "genericity". If you need to use a lot of different reifications of one generic type, this can add up quickly - so do consider keeping those methods apart; putting them in a non-generic base class or a static extension method might be worthwhile. But as always with performance - profile. It probably isn't an issue.

Luaan
  • 62,244
  • 7
  • 97
  • 116