While working on a compiler for a toy language I designed, I looked a bit around about what options there are for implementing generics in a language (by searching examples of existing languages) and I started wondering about C# generics.
I'll try to describe what I've understood first. Please feel free to correct me at any point if I've misunderstood anything. I will be using the term generic class/type/template/definition to refer to something like List<T>
and concrete class/type to refer to something like List<int>
, List<string>
, etc.
Apologies in advance about the length of the post.
C++ uses templates for generic programming. As far as I understand, this means that:
- The C++ compiler, when it encounters a template definition, simply keeps its text (or an equivalent of the text) in memory.
- Then, every time a reference to a (new) concrete type is encountered in the code, the generic template is consulted, the text for the requested concrete type is generated (by replacing the type parameters with the requested combination), and finally that code is compiled and added to the binary.
The template itself will not be available in the binary, since it was merely a text representation and never a concrete type. Only the concrete types generated from it will be visible.
(I am unsure about this last part - as I stated before, do correct me if I'm off.)
Java uses type erasure, meaning that all generic type parameters are checked for type safety only at compile time, then (if no type mismatch is detected), they are all replaced with references to the Object
base type, effectively reusing one (non-generic) class for all concrete type references.
Now, to the actual question.
After reading an interview of Anders Hejlsberg (not specifically the linked one, but his point is the same), where he criticized Java's type erasure, I had assumed that C# does not use type erasure. And, since we can reflect on the concrete types in C# and we do have such things as LINQ (which involves rather complicated usages of the generic capabilities of C#), we can safely say that C# indeed does not employ type erasure like Java.
Not knowing of any other options, I assumed that C# uses something like templates. Not quite C++ templates (as I understand them, at least), because we can obviously create a concrete type whose generic version was described in another assembly, and because we can reflect on the type again and get information about it. So, what I thought was that C# generics were more like templates plus metadata.
At some point, I read somewhere (though I can't find the link) that C# generic classes are actually abstract classes behind the scenes. Obviously, this wasn't in agreement with what I thought I knew - weird.
I forgot about it for a few months, but today I stumbled upon a question on SO very similar to this one (which, unfortunately, has no definitive answer). The author of that question demonstrates that the C# compiler doesn't do method resolution at compile time for generics even for types that can be known at compile time (shown using the new
method he created to shadow object.GetHashCode
).
Okay, so C# generics definitely aren't "text replacement plus metadata", as I originally thought. If it was, then in that question the (textually generated) concrete type for Test
would have lead the compiler to resolve the GetHashCode
call very differently.
But instead, the C# compiler resolves it as if the type was nothing more than object
, for all concrete implementations of the generic type, including the one where the new
GetHashCode
would have been resolved if it was non-generic code. This makes it look like C# generics are closer to type erasure plus metadata. Now I know the term is not very apt: it's not really type erasure if the metadata of every concrete class maintain the type parameter information - but it does resemble Java's methodology of storing everything as an object (at least for reference types, which are essentially interchangeable at a low level) and casting back and forth.
I've tried to imagine the third possibility (for which, as I said, I can't find any sources - so take it with a grain of salt), that generic classes are represented as abstract classes behind the scenes, which are simply extended and intelligently specialized by the compiler every time a new concrete type is to be generated, but I can't fully understand how it would work in practice. For example, the semantics of the modifier sealed
for a generic class would have to be "shifted" to allow for that class (in the assembly) to be extended, but not its children. Generally, I think it would make the compiler (and possibly the runtime too) very complicated in ways I can't even begin to understand.
So, how are generics really implemented in C#? Although they definitely have differences, are they closer to those of C++ or closer to those of Java? Or is the generic definition really represented as a special abstract class in the assembly? Or is it perhaps something entirely different from what I've described?
I'm not looking for a particularly detailed answer (although it would be welcome). An explanation in simple terms would be fine, as long as it clearly highlights the differences between C#/Java/C++ and offers me at least a theoretical knowledge of how the C# compiler and runtime tackle generic classes.
Edit #1: I am aware that Eric Lippert has highlighted differences between C++ templates and C# generics in at least one blog post, essentially saying "C# generics are not templates" but I'm not aware of any explanation as to what they are behind the scenes.
Edit #2: The answer to the linked question does not address the issue I'm asking about. That answer briefly explains a specific example or phenomenon that's the result of the underlying implementation, but it absolutely does not explain what's being asked: what the underlying implementation is.