10

In C# a generic function or class is aware of the types of its generic parameters. This means that dynamic type information, like is or as is available (in contrast to Java where it is not).

I'm curious, how does the compiler provides this type information to the generic methods? For classes I can image the instances can simply have a pointer to the type, but for generic functions I'm not sure, perhaps just a hidden parameter?

If the generics are preserved into the IL level, which I believe they are, then I'd like to know how this is done at that level.

edA-qa mort-ora-y
  • 30,295
  • 39
  • 137
  • 267
  • 2
    I'm afraid the answer is a bit disappointing. The C# compiler emits IL with generic type parameters. In other words, generics are supported natively by the .NET type system. – phoog Mar 05 '15 at 05:37
  • 1
    Just for side note to you typeof List is not same as List. MSIL will generate code for generic parameter at another place. look at this msdn article https://msdn.microsoft.com/en-us/library/f4a6ta2h.aspx – Jenish Rabadiya Mar 05 '15 at 05:37
  • @JenishRabadiya no, the IL is the same. In fact, there is only one IL. The multiple constructed types, should they be needed, are created at run time, by the runtime.' – phoog Mar 05 '15 at 05:40
  • For me, this is clear words; `In a generic type or method definition, a type parameters is a placeholder for a specific type that a client specifies when they instantiate a variable of the generic type. A generic class, such as GenericList listed in Introduction to Generics (C# Programming Guide), cannot be used as-is because it is not really a type; it is more like a blueprint for a type.` quote at: https://msdn.microsoft.com/en-us/library/0zk36dx2.aspx – Youngjae Mar 05 '15 at 05:58
  • 1
    @JenishRabadiya Maintaining the distinction between a parameter and an argument: the definition of the generic class or function, in both C# and MSIL, has one or more generic type parameters. When the type or function is used in a calling method, the caller specifies type arguments. At that point, the runtime constructs a type (or method) for the specified arguments, if it has not already been created, using its internal representation for constructed types (or methods). It does not create a copy of the MSIL definition of the type or method. – phoog Mar 05 '15 at 06:03
  • According to the edit of the question, he might be interested in how the JIT actually does it, rather than the C# -> CIL compiler – Jcl Mar 05 '15 at 06:13
  • @Jcl Yes, my intent was to understand more completely how it was done, not to just stop at the IL layer. – edA-qa mort-ora-y Mar 05 '15 at 06:15
  • Not that I can answer the question, but I personally know some people who worked on the JIT compiler team at MS, and they are those type of ninja-genius programmers whose reasoning for programming goes way beyond the comprehension of us mere mortals, and I'd bet the answer would be extremely complicated (then again, I may be wrong and it can be dead simple, like storing a type object somewhere in the heap or something :-) ) – Jcl Mar 05 '15 at 06:20
  • I don't mind a complicated and low-level answer. I maintain a compiler for my own language and know ways how this could be done. I'm just interested in how C#/CLR actually achieves it. – edA-qa mort-ora-y Mar 05 '15 at 06:25
  • I think you'll have to find a JIT developer to be sure... or you can just `ngen` some generic code using `is` and `as` and debug the assembler – Jcl Mar 05 '15 at 06:29
  • I believe there is a lot of confusion here because the word `type` is confusing here. It can refer to two distinct but closely related concepts. There is the IL that is the code to the datastructures and methods on a class, and there is the metadata that describes the datastructures and methods. For a generic, the generic class has many variations of the metadata, and one single copy of the IL. Further more, in .net the meta data follows the object, and unlike java, the metadata is generic aware. – Aron Mar 05 '15 at 08:11

4 Answers4

6

Since you've edited your question to extend it beyond the C# compiler to the JIT compiler, here's an overview of the process, taking List<T> as our example.

As we've established, there is only one IL representation of the List<T> class. This representation has a type parameter corresponding to the T type parameter seen in C# code. As Holger Thiemann says in his comment, when you use the List<> class with a given type argument, the JIT compiler creates a native-code representation of the class for that type argument.

However, for reference types, it compiles the native code only once and reuses it for all other reference types. This is possible because, in the virtual execution system (VES, commonly called the "runtime"), there is only one reference type, called O in the spec (see paragraph I.12.1, table I.6, in the standard: http://www.ecma-international.org/publications/standards/Ecma-335.htm). This type is defined as a "native size object reference to managed memory."

In other words, all objects in the (virtual) evaluation stack of the VES are represented by an "object reference" (effectively a pointer), which, taken by itself, is essentially typeless. How then does the VES ensure that we don't use members of an incompatible type? What stops us from calling the string.Length property on an instance of System.Random?

To enforce type safety, the VES uses metadata that describes the static type of each object reference, comparing the type of a method call's receiver to the type identified by the method's metadata token (this applies to access of other member types as well).

For example, to call a method of the object's class, the reference to the object must be on the top of the virtual evaluation stack. The static type of this reference is known thanks to the method's metadata and analysis of the "stack transition" -- the changes in the state of the stack caused by each IL instruction. The call or callvirt instruction then indicates the method to be called by including a metadata token representing the method, which of course indicates the type on which the method is defined.

The VES "verifies" the code before compiling it, comparing the reference's type to that of the method. If the types are not compatible, verification fails, and the program crashes.

This works just as well for generic type parameters as it does for non-generic types. To achieve this, the VES limits the methods that can be called on an reference whose type is an unconstrained generic type parameter. The only allowed methods are those defined on System.Object, because all objects are instances of that type.

For a constrained parameter type, the references of that type can receive calls for methods defined by the types of the constraint. For example, if you write a method where you have constrained type T to be derived from ICollection, you can call the ICollection.Count getter on a reference of type T. The VES knows that it is safe to call this getter because it ensures that any reference being stored to that position in the stack will be an instance of some type that implements the ICollection interface. No matter what the actual type of the object is, the JIT compiler can therefore use the same native code.

Consider also fields that depend on the generic type parameter. In the case of List<T>, there is an array of type T[] that holds the elements in the list. Remember that the actual in-memory array will be an array of O object references. The native code to construct that array, or to read or write its elements, looks just the same regardless of whether the array is a member of a List<string> or of a List<FileInfo>.

So, within the scope of an unconstrained generic type such as List<T>, the T references are just as good as System.Object references. The advantage of generics, though, is that the VES substitutes the type argument for the type parameter in the caller's scope. In other words, even though List<string> and List<FileInfo> treat their elements the same internally, the callers see that the Find method of the one returns a string, while that of the other returns a FileInfo.

Finally, because all of this is achieved by metadata in the IL, and because the VES uses the metadata when it loads and JIT-compiles the types, the information can be extracted at run time through reflection.

phoog
  • 42,068
  • 6
  • 79
  • 117
  • So the actual type is available on the stack in the generic functions then? It is essentially a hidden parameter to the function (though I understand calling it a parameter at this level is a bit wrong: it's something pushed on/off the stack, or virtual registers, for the function calls) – edA-qa mort-ora-y Mar 07 '15 at 05:12
  • @edA-qamort-ora-y a generic method can indeed get access to the type argument through `typeof(T)`, but within the scope of the generic method there is no way to call members of `T` (except considering type constraints, as explained above). Another thing to consider is that static fields are not shared between reference-type constructions of a generic type; for example, consider `class C { public static string S; }` Here, `C.S` and `C.S` do not denote the same storage location, even though methods of the two constructed types *would* denote the same body of native code. – phoog Mar 07 '15 at 06:31
  • To narrow my question a bit, in the generated native code of the generic function how does `typeof(T)` actually resolve? Does it reference the type object `T` on the stack somewhere? – edA-qa mort-ora-y Mar 07 '15 at 11:37
  • @edA-qamort-ora-y the method has the type's metadata token available somewhere, and it seems to be loaded onto the stack with the IL instruction `ldtoken`; the method's local variables include one whose type is System.Type. The only thing that isn't clear to me is how the called function knows which metadata token to associate with type T, depending on which constructed type was called, but basically the static fields are separate and the methods are shared, which is possible because the data needed for the method can be passed in when it is invoked. – phoog Mar 08 '15 at 05:16
1

You asked how casts (including is and as) can work on variables of a generic type parameter. Since all objects store metadata about their own type all casts work the same way as if you had used the variable type object. The object is interrogated about its type and a runtime decision is being made.

Of course this technique is only valid for reference types. For value types the JIT compiles one specialized native method for each value type that is used to instantiate the generic type parameters. In that specialized method the type of T is exactly known. No further "magic" is needed. Value type parameters are therefore a "boring" case. To the JIT it looks like there are no generic type parameters at all.

How can typeof(T) work? This value is passed as a hidden parameter to generic methods. This is also how someObj as T is able to work. I'm quite sure it's being compiled as a call to a runtime helper (e.g. RuntimeCastHelper(someObj, typeof(T))).

usr
  • 168,620
  • 35
  • 240
  • 369
  • I think value types are a little trickier than you suggest because the JIT doesn't always produce a separate machine-code function for each distinct value type. For example, I think `KeyValuePair,Int32>` and `KeyValuePair,Int32>` both use the same machine code, since their storage layouts of all their parameters match. I'm not sure exactly what can and cannot be shared, though. – supercat Mar 06 '15 at 23:28
  • 1
    @supercat I suppose that the same code is used for `KeyValuePair` or `KeyValuePair` for that matter; as long as the first type argument is a reference type, – phoog Mar 07 '15 at 06:21
0

The clr runtime compiles each method separately just in time when it is first executed. You can see this, if you use a type somewhere in a method with multiple lines and the dll the type is defined in is missing. Set a breakpoint in the first line of the method. On calling the method, a type load exception is thrown. The breakpoint is not hit by the debugger. Now separate the method in three submethods. The middle one should contain the lines with the missing type. Now you can step in the method with the debugger and also into the first of the new methods, but when calling the second one, the exception is thrown. This is due to the fact that the method is compiled when it is first called and only then the compiler/linker stumbles over the missing type.

To answer your question: As pointed out by others, the generics are supported in the IL. On execution time, when you create a List for the first time, the constructor code is compiled (with the subsitution of int for the type parameter). If you then create a List for the first time, the code is compiled again with string as the type parameter. You can see it as if the concrete classes with concrete types are generated at runtime on the fly.

Holger Thiemann
  • 1,042
  • 7
  • 17
  • The docs on MSDN on generics indicate only one copy of the code is generated for reference types. For value types multiple versions are instantiated, but not for reference types. You're saying something different, indicating each instance is compiled to different code. – edA-qa mort-ora-y Mar 05 '15 at 10:50
  • Not every instance. When you instantiate 100 instances of List the code für List<> is compiled once with int as type parameter. But if you also instantiate one or 100 List, then the code for List<> will be compiled a second time with the type parameter string. At least this is my understanding. – Holger Thiemann Mar 06 '15 at 09:49
  • I'm speaking on the template type instance not the final object instantiation. That is `List` is one instance of `List`. – edA-qa mort-ora-y Mar 06 '15 at 16:48
  • @edA-qamort-ora-y Holger's account of string is slightly incorrect; List will be compiled to List because IL has only one reference type: object. This same native code created by the JIT compiler will be used for all other instantiations where the type parameter is a reference type (for example, List or List). The objects' run-time types are tracked by the objects themselves, because they each have a "TypeHandle" field. Since nobody else has done so, I'm going to write up an answer now. – phoog Mar 06 '15 at 19:21
  • @phoog So for every value type it is as I said, but for reference types it is just done once? But something has to be done also for different reference types. If I used List and then I use List then something must be constructed to hold the statics of this "new" type, for example. And what I don't understand in your accepted answer (which I cannot comment directly :-)) if I have a GenericList where T : Class1. And I have Class2 and Class3 deriving from Class1. And I construct an instance of GenericList what does prevent me to do an .Add(Class3)? – Holger Thiemann Mar 09 '15 at 13:03
  • Yes, there is some sort of dictionary for statics of various constructed types. I asked Eric Lippert about this once and IIRC that was his answer. If I can find that I'll post a link. If you try to add a Class3 to the List, the code will not compile. Try it. – phoog Mar 09 '15 at 19:22
  • I found Eric's comment. He did not actually discuss the mechanism by which JIT-compiled method code is shared between types while static fields are not. Here's the link: http://stackoverflow.com/questions/9198087/do-static-locks-work-across-different-children-classes/9198239#9198239 – phoog Mar 09 '15 at 19:47
0

how does the compiler provides this type information to the generic methods?

tl;dr It provides the type information by effectively duplicating the method for every unique type that it is used with.

Now, for those of you who want to read more... ;) The answer is actually quite simple once you get a little example to go with it.

Let's start with this:

public static class NonGenericStaticClass
{
    public static string GenericMethod<T>(T value)
    {
        if(value is Foo)
        {
            return "Foo";
        }
        else if(value is Bar)
        {
            return "Bar";
        }
        else
        {
            return string.Format("It's a {0}!", typeof(T).Name);
        }
    }
}

// ...

static void Main()
{
    // Prints "It's a Int32!"
    Console.WriteLine(NonGenericStaticClass.GenericMethod<int>(100));

    // Prints "Foo"
    Console.WriteLine(NonGenericStaticClass.GenericMethod<Foo>(new Foo()))

    // Prints "It's a Int32!"
    Console.WriteLine(NonGenericStaticClass.GenericMethod<int>(20));
}

Now, as other people have already stated, IL supports generics natively, so the C# compiler doesn't actually do much with this example. However, when the Just-In-Time compiler comes along to turn the IL into Machine Code, it has to convert the generic code into something that is not generic. To do this, the .Net Just-In-Time compiler effectively duplicates the method for each of the different types that are used with it.

If the resulting code were in C# it would probably look something like this:

public static class NonGenericStaticClass
{
    // The JIT Compiler might rename these methods after their
    // representative types to avoid any weird overload issues, but I'm not sure
    public static string GenericMethod(Int32 value)
    {
        // Note that the JIT Compiler might optimize much of this away
        // since the first 2 "if" statements are always going to be false
        if(value is Foo)
        {
            return "Foo";
        }
        else if(value is Bar)
        {
            return "Bar";
        }
        else
        {
            return string.Format("It's a {0}!", typeof(Int32).Name);
        }
    }

    public static string GenericMethod(Foo value)
    {
        if(value is Foo)
        {
            return "Foo";
        }
        else if(value is Bar)
        {
            return "Bar";
        }
        else
        {
            return string.Format("It's a {0}!", typeof(Foo).Name);
        }
    }
}

// ...

static void Main()
{
    // Notice how we don't need to specify the type parameters any more.
    // (of course you could've used generic inference, but that's beside the point),
    // That is because they are essentially, but not necessarily, overloads of each other

    // Prints "It's a Int32!"
    Console.WriteLine(NonGenericStaticClass.GenericMethod(100));

    // Prints "Foo"
    Console.WriteLine(NonGenericStaticClass.GenericMethod(new Foo()))

    // Prints "It's a Int32!"
    Console.WriteLine(NonGenericStaticClass.GenericMethod(20));
}

Once you've generated the non-generic methods, then you know exactly what type you are dealing with through the wonderful use of static dispatch.

Now, there are obviously going to be differences between how I represent the transformation and how it is actually done, but that is the gist of it. Also, the same sort of process is done for generic types as well.

For some contrast, the Java compiler "cheats" generics. Instead of generating new types and methods like .Net does, Java inserts casts where you expect the value to be of a certain type. As such, our typeof(T) would not be possible in the Java world, instead we would have to use the getClass() method.

Community
  • 1
  • 1
AtinSkrita
  • 1,373
  • 12
  • 13