1

I'm reading a book which says:

The variable representing an struct instance doesn’t contain a pointer to an instance; the variable contains the fields of the instance itself. Because the variable contains the instance’s fields, a pointer doesn’t have to be dereferenced to manipulate the instance’s fields. The following code demonstrates how reference types and value types differ

class SomeRef { public Int32 x; }
struct SomeVal { public Int32 x; }

static void ValueTypeDemo() {
   SomeRef r1 = new SomeRef();        // Allocated in heap
   SomeVal v1 = new SomeVal();        // Allocated on stack
   r1.x = 5;                          // Pointer dereference
   v1.x = 5;                          // Changed on stack
}

I'm from C background and little bit confused about the struct variable v1, I feel like v1.x = 5; still involve pointer dereference just like an array variable in C is a pointer to the address of first element in that array, I feel like v1 must be a pointer that points to the address(of stack, not heap of course) of first field in SomeVal, if my understanding is correct, then v1.x = 5; must involve pointer dereference too? If not, how a pointer is not involved if we want to access an random field in a struct as the compiler needs to generate the offset to the field, still a pointer has to be involved?

  • `SomeVal v1` behaves *absolutely identical* to C struct... Except C does not have syntax to call non-possible constructor for a struct, so you can ignore `= new SomeVal()` part altogether... Not really sure why you think similar code in C would involve heap... – Alexei Levenkov Jan 26 '21 at 07:02
  • @Alexei I'm not saying the heap is involved, I mean pointer dereference is involved, and the pointer points to the stack –  Jan 26 '21 at 07:33
  • That's not how pointers work. If you're simply accessing a memory address then cool. But if you're going there to read another address, that's a pointer. I don't see why you think a dereference is involved. – Zer0 Jan 26 '21 at 08:00
  • 2
    *If* the struct is stored on the stack, then the compiler can compute an offset from the stack pointer and just perform one dereference. The stack pointer is effectively a "free" pointer/dereference that we don't tend to count because we don't have to retrieve that pointer *first* before accessing memory relative to it, it's always on CPU. Any other pointer first has to be loaded itself, and it's that extra load/reference that tends to be counted as a dereference. – Damien_The_Unbeliever Jan 26 '21 at 08:05
  • Does this answer your question? [What and where are the stack and heap?](https://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap) and [Stack and heap in c sharp](https://stackoverflow.com/questions/3727266/stack-and-heap-in-c-sharp) and [Memory allocation: Stack vs Heap?](https://stackoverflow.com/questions/4487289/memory-allocation-stack-vs-heap) and [Stack and Heap allocation](https://stackoverflow.com/questions/11189932/stack-and-heap-allocation) –  Jan 26 '21 at 09:34
  • @OlivierRogier I don't know how you come up with the conclusion that I have asked 10 questions like this, I only asked one thread stack quesiton(more related to OS or C) which is not relevent this question at all, I did delete that one and asked a new one because I feel that a lot things need to add to the question and better to start with a new one, that is, I only ask 4 or 5 questions, each of them are not the same! –  Jan 26 '21 at 09:48
  • @Damien_The_Unbeliever Thank you for your concise answer, that's exact what I need –  Jan 26 '21 at 09:51
  • @OlivierRogier I only delete one or two question this month from what I can remember, becuase I later I feel it is better to ask in a different forum like softengineeroverflow, I don't know how you get the idea that I deleted A LOT OF (like 10) questions days after days, this is not me, it is someone else, you got the wrong person –  Jan 26 '21 at 09:55

4 Answers4

2

In theory the runtime does not guarantee how the struct will be stored, it could could store it however it wants as long as the behavior is the same.

In practice your example will be stored as part of the methods stack frame. So v1 will reserve the space of the struct, i.e. 4 bytes. Access to the field of the struct will simply be converted to the corresponding field, the same way as if you used a int32 directly.

If the struct have multiple fields the compiler would simply add multiple offsets together, one to the beginning of the struct and one to the actual field. All of this is known at compile-time so it is no problem for the compiler to figure out this.

Note that while the CIL uses a stack based model, the jitter might optimize variables to be stored in registers instead. There is also the ref-keyword that allows a reference to a value type, somewhat similar to a pointer.

JonasH
  • 28,608
  • 2
  • 10
  • 23
  • 1
    There's no guarantee that the struct will be on the stack. It's small enough that the JIT may choose to keep it in a register instead. – Damien_The_Unbeliever Jan 26 '21 at 08:02
  • @Damien_the_unbeliever Yes, I thought that was implied by the "could store it however it wants", but it might be worth mentioning it explicitly. – JonasH Jan 26 '21 at 08:49
  • "*Not guaranteed*": Indeed, .NET is a virtual platform. Actually and on Intel-like microprocessors technologies (x86, x32, x64 and similar actual silicium tech) since the CPU beginning and the invention of the stack registers, the behavior is like it is. But in the future, indeed, underlying things may be different with any other tech gen like quantum. –  Jan 26 '21 at 09:38
1

Related answers:
How does a struct instance's virtual method get located using its type object in heap?
How boxing a value type work internally in C#?
Is everything in .NET an object?

As said by @ Damien_The_Unbeliever, what follows is only valid for current computing technologies because .NET is a virtual platform. Actually and on Intel-like microprocessors (x86, x32, x64 and similar) since the CPU beginning and the invention of the stack registers, the behavior is like it is. But in the future, underlying things may be different with any other tech gen like quantum.

Instances of struct as members of a class are allocated with the object itself, so in the heap, but a local struct variable declared in a method is allocated in the stack.

Also, variables passed as parameters for a method always use the stack: reference as well as content of structures are PUSHed and POPed, therefore the limit of structures and anonymous types that are recommended to not be over used and to not be too big.

To simplify things and to understand, imagine that the heap is a whole room and the stack is a cupboard in this room.

This cupboard is for local value-types variables and references used to run the program, as well as to pass data between methods and to get results of these methods when they are functions and not procedures: references, value types, integral types, structs content, anonymous types and delegates are PUSHed and POPed to and from this cupboard as a temporary container.

The room is for objects themselves (we pass references of objects), except structs alone which are not in objects (we pass all the struct content, and it is the same when we pass a struct that is in a class, we pass the entire struct as a copy).

For example:

class MyClass
{ 
  MyStruct MyVar;
}

Is a struct variable "not alone" created in the head with the object when created anywhere.

But:

void MyMethod()
{ 
  MyStruct MyVar;
}

Is a local "alone" instance of the struct created in the stack as well as integers.

Thus if a class has 10 integers, only the reference is PUSHed in the stack when calling a method (4 bytes on x32 and 8 bytes on x64). But if it were a struct, it requires to PUSH 10 integers (40 bytes on x32 as well as x64).

In other words as you wrote: So then struct instances alone (thus, assigned to a local variable of struct type) are not stored in the Heap. But members of a class (thus, assigned to a field of struct type) are stored in the Heap.

That said:

  • Members (integral numeric and references pointer "values") of a struct in the heap are accessed using direct memory access using MOV opcodes and equivalent (virtual or targetted machine code).

  • Members of a struct in the stack are accessed using the stack register base+offset.

First is slow and second is faster.

How would the memory look like for this object?

What and where are the stack and heap?

Stack and heap in c sharp

Memory allocation: Stack vs Heap?

Stack and Heap allocation

Stack and Heap memory

Why methods return just one kind of parameter in normal conditions?

List of CIL instructions

.NET OpCodes Class

Stack register

The Concept of Stack and Its Usage in Microprocessors

Introduction of Stack based CPU Organization

What is the role of stack in a microprocessor?

To understand better and to improve your skills in computing, you may found interesting to investigate what is assembly language and how work the CPU. You can start with IL and modern Intel but it may be simpler, formative and complementary to start from the past 8086 to i386/i486.

1

You are correct - a pointer to the struct IS involved, but the offset to the fields within the struct are computed at compile time.

The IL instruction used to store a (non-reference) value in a field is stfld, and the instruction to load a (non-reference) value from a field is ldfield.

Of course, these IL instructions are converted to assembly by the JIT compiler, which is likely to apply a number of optimisations such as avoiding loading the same pointer multiple times, but that varies by compiler version and whether you have enabled a DEBUG or a RELEASE build.

As an example, consider the following struct:

struct SomeVal
{
    public Int32 x; 
    public Int32 y;
}

And the code:

SomeVal v1 = new SomeVal();
v1.x = 5;
v1.y = 6;
Console.WriteLine(v1.x + v1.y);

The IL generated for a RELEASE build for this is:

.entrypoint
.locals init (
    [0] valuetype ConsoleApp1.SomeVal V_0
)

IL_0000: ldloca.s V_0
IL_0002: initobj ConsoleApp1.SomeVal
IL_0008: ldloca.s V_0
IL_000a: ldc.i4.5
IL_000b: stfld int32 ConsoleApp1.SomeVal::x
IL_0010: ldloca.s V_0
IL_0012: ldc.i4.6
IL_0013: stfld int32 ConsoleApp1.SomeVal::y
IL_0018: ldloc.0
IL_0019: ldfld int32 ConsoleApp1.SomeVal::x
IL_001e: ldloc.0
IL_001f: ldfld int32 ConsoleApp1.SomeVal::y
IL_0024: add
IL_0025: call void [mscorlib]System.Console::WriteLine(int32)
IL_002a: ret

The IL for v1.x = 5 is:

IL_0008: ldloca.s V_0
IL_000a: ldc.i4.5
IL_000b: stfld int32 ConsoleApp1.SomeVal::x

Note how it:

  1. Pushes the address of the struct onto the stack using ldloca.s V_0
  2. Pushes a constant int32 value 5 onto the stack using ldc.i4.5
  3. Stores that int32 value into the field that is at a constant offset defined by ConsoleApp1.SomeVal::x using stfld int32 ConsoleApp1.SomeVal::x

You can see similar IL code for loading the x and y fields before adding them together using add.

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
0

OP Question:

If not, how a pointer is not involved if we want to access an random field in a struct as the compiler needs to generate the offset to the field, still a pointer has to be involved?

Coming from a C background you would understand a variable that is a pointer e.g., int* p;

For structs on the stack, generally a pointer is involved but not a pointer that you would know about like the *p mentioned above.

For example, a compiler/jitter could generate code that uses a pointer to hold the base address of the stack. From there, the offsets of variables on the stack are generated as constants that can be added to the stack pointer to access the values of variables on the stack. Some CPUs have a register called SP (stack pointer) that tracks the base address of the stack.

For your example, the code that is generated could be illustrated in pseudo-assembly code as this:

// Suppose SP = 0x1004
// SP = 0x1000 after v1 is allocated (SP decremented by 4)
// SP is now the pointer that you were wondering about
// v1.x = 5, Place 32-bit constant value 5 at address 0x1000
mov 5, [SP] 

Note that the SP register is the pointer that you wonder about. Also note that since your struct only contains one variable, the address of the struct is the same as the address of the variable. This pseudo-assembly could also be like this:

// Suppose SP = 0x1004
// v1.x = 5, push constant value 5 on the stack
push #5   
// SP = 0x1000 (again, SP decremented by 4)

So, a pointer is involved, but it is used in lower-level code that is generated by the compiler/jitter, and at the source code level, no explicit pointer variable is needed. In order to read v1.x, the same SP (pointer) can be used by the lower-level generated code. For example:

// int a = v1.x;
// pseudo-assembly of generated code
// A CPU register R0 is used for variable a.
// Place the value at address SP into R0.
mov [SP], R0

For your case, C#, MSIL code would be generated as an excellent answer shows. After MSIL compilation, e.g., for x64 CPU target, the code would be similar in concept to what is shown here.

Coder
  • 133
  • 7