Understanding "this" argument for structs (specifically Iterators/async)

Question

I'm currently inspecting deep objects in the CLR using the Profiler API. I have a specific problem analyzing "this" argument for Iterators/async methods (generated by the compiler, in the form of <name>d__123::MoveNext).

While researching this I found that there is indeed a special behavior. First, the C# compiler compiles these generated methods as structs (only in Release mode). ECMA-334 (C# Language Specification, 5th edition: https://www.ecma-international.org/publications/files/ECMA-ST/ECMA-334.pdf) states (12.7.8 This access):

"... If the method or accessor is an iterator or async function, the this variable represents a copy of the struct for which the method or accessor was invoked, ...."

This means that unlike other "this" arguments, in this case the "this" is send by value, not by reference. I indeed see the copy isn't modified outside. I'm trying to understand how, exactly, is the struct actually sent.

I took the liberty to strip down the complicated case, and replicate this with a small struct. Look at the following code:

struct Struct
    {
        public static void mainFoo()
        {
            Struct st = new Struct();
            st.a = "String";
            st.p = new Program();
            System.Console.WriteLine("foo: " + st.foo1());
            System.Console.WriteLine("static foo: " + Struct.foo(st));
        }

        int i;
        String a;
        Program p;

        [MethodImplAttribute(MethodImplOptions.NoInlining)]
        public static int foo(Struct st)
        {
            return st.i;
        }

        [MethodImplAttribute(MethodImplOptions.NoInlining)]
        public int foo1()
        {
            return i;
        }
    }

NoInlining is just so we can inspect the JITted code properly. I'm looking at three different things: how mainFoo calls foo/foo1, how foo is compiled and how foo1 is compiled. The following is the IL code generated (using ildasm):

.method public hidebysig static int32  foo(valuetype nitzan_multi_tester.Struct st) cil managed noinlining
{
  // Code size       7 (0x7)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  ldfld      int32 nitzan_multi_tester.Struct::i
  IL_0006:  ret
} // end of method Struct::foo

.method public hidebysig instance int32  foo1() cil managed noinlining
{
  // Code size       7 (0x7)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  ldfld      int32 nitzan_multi_tester.Struct::i
  IL_0006:  ret
} // end of method Struct::foo1

.method public hidebysig static void  mainFoo() cil managed
{
  // Code size       86 (0x56)
  .maxstack  2
  .locals init ([0] valuetype nitzan_multi_tester.Struct st)
  IL_0000:  ldloca.s   st
  IL_0002:  initobj    nitzan_multi_tester.Struct
  IL_0008:  ldloca.s   st
  IL_000a:  ldstr      "String"
  IL_000f:  stfld      string nitzan_multi_tester.Struct::a
  IL_0014:  ldloca.s   st
  IL_0016:  newobj     instance void nitzan_multi_tester.Program::.ctor()
  IL_001b:  stfld      class nitzan_multi_tester.Program nitzan_multi_tester.Struct::p
  IL_0020:  ldstr      "foo: "
  IL_0025:  ldloca.s   st
  IL_0027:  call       instance int32 nitzan_multi_tester.Struct::foo1()
  IL_002c:  box        [mscorlib]System.Int32
  IL_0031:  call       string [mscorlib]System.String::Concat(object,
                                                              object)
  IL_0036:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_003b:  ldstr      "static foo: "
  IL_0040:  ldloc.0
  IL_0041:  call       int32 nitzan_multi_tester.Struct::foo(valuetype nitzan_multi_tester.Struct)
  IL_0046:  box        [mscorlib]System.Int32
  IL_004b:  call       string [mscorlib]System.String::Concat(object,
                                                              object)
  IL_0050:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_0055:  ret
} // end of method Struct::mainFoo

The assembly code generated (relevant parts only):

foo/foo1:
mov eax,dword ptr [rcx+10h]
ret

fooMain (line 18):
mov rcx,offset mscorlib_ni+0x8aaf8 (00007ffc`37d6aaf8) (MT: System.Int32)
call    clr+0x2510 (00007ffc`392f2510) (JitHelp: CORINFO_HELP_NEWSFAST)
mov     rsi,rax
lea     rcx,[rsp+40h]
call    00007ffb`d9db04e0 (nitzan_multi_tester.Struct.foo1(), mdToken: 000000000600000b)
mov     dword ptr [rsi+8],eax
mov     rdx,rsi
mov rcx,1DBCE383690h
mov     rcx,qword ptr [rcx]
call    mscorlib_ni+0x635bd0 (00007ffc`38315bd0) (System.String.Concat(System.Object, System.Object), mdToken: 000000000600054f)
mov     rcx,rax
call    mscorlib_ni+0x56d290 (00007ffc`3824d290) (System.Console.WriteLine(System.String), mdToken: 0000000006000b78)

fooMain (line 19):
mov rcx,offset mscorlib_ni+0x8aaf8 (00007ffc`37d6aaf8) (MT: System.Int32)
call    clr+0x2510 (00007ffc`392f2510) (JitHelp: CORINFO_HELP_NEWSFAST)
mov     rsi,rax
lea     rcx,[rsp+28h]
mov     rax,qword ptr [rsp+40h]
mov     qword ptr [rcx],rax
mov     rax,qword ptr [rsp+48h]
mov     qword ptr [rcx+8],rax
mov     eax,dword ptr [rsp+50h]
mov     dword ptr [rcx+10h],eax
lea     rcx,[rsp+28h]
call    00007ffb`d9db04d8 (nitzan_multi_tester.Struct.foo(nitzan_multi_tester.Struct), mdToken: 000000000600000a)
mov     dword ptr [rsi+8],eax
mov     rdx,rsi
mov rcx,1DBCE383698h
mov     rcx,qword ptr [rcx]
call    mscorlib_ni+0x635bd0 (00007ffc`38315bd0) (System.String.Concat(System.Object, System.Object), mdToken: 000000000600054f)
mov     rcx,rax
call    mscorlib_ni+0x56d290 (00007ffc`3824d290) (System.Console.WriteLine(System.String), mdToken: 0000000006000b78)

The first thing we can all see is that both foo and foo1 generates the same IL code (and the same JITted assembly code). This makes sense, since eventually we're just using the first argument. The second thing we see, is that mainFoo calls the two methods differently (ldloc vs ldloca). Since both foo and foo1 expects the same input, I would expect that mainFoo will send the same arguments. This brought up 3 questions

1) What exactly does it mean to load a struct on the stack vs loading a struct's address on that stack? I mean, a struct of size bigger than 8 bytes (64 bit), can't "sit" on the stack.

2) Is the CLR generating a copy of the struct before just to use as "this" (We know this is true, according to C# specification)? Where is this copy stored? fooMain assembly shows that the calling method generates the copy on it's stack.

3) It seems as though both loading a struct by value and address (ldarg/ldloc vs ldarga/ldloca) actually loads an address - for the second set it just creates a copy before. Why? Am I missing something here?

4) Back to Iterators/async - is the foo/foo1 example replicating the difference between "this" argument for iterators&non-iterators structs? Why is this behavior wanted? Creating a copy seems like a waste of work. What's the motivation?

(This example is taken using .Net framework 4.5, but the same behavior is also seen using .Net framework 2 and CoreCLR)

"I mean, a struct of size bigger than 8 bytes (64 bit), can't "sit" on the stack." - why not? pretty sure it can... it might not fit in many *registers* or offer atomicity, but... the stack is just memory space... — Marc Gravell, Jun 10 '19 at 12:30
I'll rephrase - it can, of course, sit on the stack (any normal sized struct). So far, from what I have seen, the evaluation stack never holds anything bigger than 8 bytes. For example, If I generate an IL code to push a huge struct on the stack and pop just the top value, it empties the evaluation stack. — Egozy, Jun 10 '19 at 12:37
but the huge struct **is** the value, so yes, if you pop "just the top value", I'd expect the entire struct to disappear...? — Marc Gravell, Jun 10 '19 at 12:50
That sounds good, I just never thought it works this way. JVM's operand stack consists only of 8-byte values, so I had the instinct of this being the same. It brings a lot of follow-up questions (for example - how can the CLR know from the stack alone that the top value is a struct?). I'll leave that aside, for now. Thanks. We can still see that both load and load-address commands puts an address of the stack, eventually, and this is what's interesting in this case. — Egozy, Jun 10 '19 at 12:55
that's because in JVM, you only need to think about references (classes) and the JVM's inbuilt primitives, which all happen to be small; in .NET, we have custom structs of arbitrary size; the whole point of a struct is that semantically it is a "value", so a single pop / ld *is* the entire struct — Marc Gravell, Jun 10 '19 at 13:02
"load" in this case doesn't typically put an address on the stack, but when you try to *do something* with that value in the stack, you often need to use the stack address to the start of the struct... — Marc Gravell, Jun 10 '19 at 13:03

score 0 · Answer 1 · answered Dec 25 '20 at 01:08

I will quote from the ECMA 335 spec, which defines the CLR on which C# is based, and then we will see how that answers your questions.

I.8.9.7 Value type definition
snip

When a non-static method (i.e., an instance or virtual method) is called on the value type, its this pointer is a managed reference to the instance, where as when the method is called on the associated boxed type, the this pointer is an object reference.
Instance methods on value types receive a this pointer that is a managed pointer to the unboxed type whereas virtual methods (including those on interfaces implemented by the value type) receive an instance of the boxed type.

This tells us that an instance method of struct, such as foo1() above, have a this pointer which is represented as a managed reference, i.e. a GC pointer to the actual struct, you know this in C# as a ref.

In the case of boxed structs that are known to be of that type, it is possible to call a method without unboxing, the CLR will pass the ref pointer automatically. See II.13.3.

Now, what happens if we need to access the field from a struct stored in a local, a ref or loaded directly on the stack?

III.4.10 ldfld – load field of an object

Stack Transition

... obj => value ...

The ldfld instruction pushes onto the stack the value of a field of obj. obj shall be an object (type O), a managed pointer (type &), an unmanaged pointer (type native int), or an instance of a value type.

So no matter where the struct is, we can use ldfld to get the value. The entire value on the stack is popped, and the value loaded. But you must understand that the object on the logical (theoretical) stack is different in each case.
In foo(), you pass the struct by value on the stack (ldloc.0) and the method does the same (ldarg.0).
In foo1(), the struct is passed as this by ref (ldloca.s), and it's loaded by-ref (here ldarg.0 represents the ref).

The following will be relevant in a moment.

I.8.2.1 Managed pointers and related types

snip ...they cannot be used for field signatures...
snip Rationale: For performance reasons items on the GC heap may not contain references to the interior of other GC objects, this motivates the restrictions on fields...

Now to answer your questions:

We can load a struct direct to the stack. This will take up however many bytes the struct is.
Your example is not a case of iterators or async. The c# spec at ECMA-334 12.7.8 says this is a ref, so this is actually a mutable pointer. You can prove this by mutating the struct in foo1().
Your example of a struct is a bit of an exception when it comes to the JITted assembler in foo(). It seems the JIT will optimize for a struct being bigger than 8 bytes and pass it by-ref where possible i.e. without changing the semantics.
In an actual async or iterator function, the parameters are transformed into fields of a compiler-generated struct, which works as a state machine. The CLR will not permit a ref to be stored in a field, so by-value semantics must be followed.

Understanding "this" argument for structs (specifically Iterators/async)

1 Answers1

Linked