0

I am wondering how the memory layout for this class (its object) would look like:

class MyClass
{
    string myString;

    int myInt;

    public MyClass(string str, int i)
    {
        myString = str;
        myInt = i;
    }
}

MyClass obj = new MyClass("hello", 42);

Could anyone visualize that?

Update:

Based on the answer from Olivier Rogier and the comments from ckuri and Jon Skeet I tried to come up with a high level chart, heavily influenced by the devblog article mentioned by ckuri.

So to my understanding:

  1. obj (8 bytes reference) points to the object including metadata (actually not to its beginning, but let's ignore that for simplicity).

  2. At this place the myInt is stored and the myString reference value (which is the reference to the real string value)

enter image description here

I don't want to got into the last details, but what I am still curious about:

  1. If obj.myString shall be accessed, are there two "lookups" necessary, e.g. first looking up obj, then following it and looking up myString or is there something like a global address table where the address for obj.myString is directly stored?

  2. Where is the reference value of obj stored? Is it part of the program object block like myString is part of the obj object block? (assuming obj is created inside an instance program)

stefan.at.kotlin
  • 15,347
  • 38
  • 147
  • 270
  • 1
    I am very confused as to what you are asking – maccettura Oct 14 '19 at 19:09
  • 1
    Possible duplicate of https://stackoverflow.com/questions/8951828/clr-class-memory-layout – Jay Buckman Oct 14 '19 at 19:11
  • What does "shall be accessed" mean in your second question numbered (1) ? Can you give an example of "access"? Also I do not understand what you mean by "global address table". – Eric Lippert Oct 14 '19 at 22:16
  • Also it would be helpful to understand your purpose in asking these questions; the vast majority of C# developers never have to worry about this stuff. Is there some deeper problem that you're trying to solve here? If so, say what that problem is and we can help you attack it directly. – Eric Lippert Oct 14 '19 at 22:28
  • Also, your diagram does not correctly show the structure of the string object, which is considerably more complex than you've shown here; do you care about that? – Eric Lippert Oct 14 '19 at 22:29
  • @EricLippert: I fixed the visualization, the string now comes before the int (like in the source code). Will check your reply. would also be interested how the string object looks like. no special problem I am trying to solve, just learning and curiousity. – stefan.at.kotlin Oct 16 '19 at 18:32
  • Matt Warren -- the MVP, not the C# compiler architect -- has a good post that gives the basics of the string layout. https://mattwarren.org/2016/05/31/Strings-and-the-CLR-a-Special-Relationship/. If you want historical perspective on the provenance of length-prefixed strings in Microsoft developer tools see my 2003 article on the subject: https://ericlippert.com/2003/09/12/erics-complete-guide-to-bstr-semantics/ – Eric Lippert Oct 16 '19 at 19:25

2 Answers2

3

At this place the myInt is stored and the myString reference value (which is the reference to the real string value)

Let's make sure you're not going down bad paths here.

First off, it's unclear to me why you re-ordered the integer and the string in the diagram compared to the source code. It is implementation-defined how the string and the integer are packed, and in what order, and whether there are any padding bytes. If you care about these details, ask a more clear question.

Second, it is unclear what you mean by "the real string value". Strings are of reference type. The real value of the string is the reference. The values of the contents of the string are in the referenced location.

if obj.myString shall be accessed, are there two "lookups" necessary, e.g. first looking up obj, then following it and looking up myString

I assume that by "lookup" you mean dereference.

So for example, if we have:

var obj = whatever;
char c = obj.myString[1];

then yes, we have two dereferences. The . dereferences obj to get myString, which is a reference. The [1] dereferences myString to get the char.

Where is the reference value of obj stored?

obj is a variable. A variable is a storage location. That storage location can be in a number of places:

  • If obj is short lived, or even better, ephemeral, then it can be enregistered or put on the short term pool. (More commonly known as the stack, but it is a better habit in my opinion to think of the short term pool in terms of its semantics, namely, storage that lives not longer than activation. The stack is an implementation detail.)

  • If obj is not known to be short lived then it goes on the long-term pool, also known as the managed heap.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • What is an "ephemeral" object? Does it have anything to do with [Ephemeral generations and segments](https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/fundamentals#ephemeral-generations-and-segments)? – Luca Cremonesi Oct 15 '19 at 14:34
  • 1
    @LucaCremonesi: What I mean by "ephemeral" in this context is: consider a method body fragment like `int x = Foo(); int y = x + Bar(); Blah(y);` where `x` and `y` are not used anywhere else in the body. The compiler will generate code to produce a stack frame for the activation of the method; how many slots does it have to reserve at the top of the frame for the locals? It looks like enough for two ints, but the compiler could reason that this program is the same as `Blah(Foo() + Bar())` and generate zero reserved slots. – Eric Lippert Oct 15 '19 at 16:40
  • 1
    @LucaCremonesi: The variables `x` and `y` in this case can become "ephemeral". Their storage only exists *while the variable is being used* because the storage is just pushed onto the evaluation stack (in IL) when it is needed. The jitter will then turn it into either a stack push or a register allocation, as it sees fit, and the stack frame gets slightly smaller. This is a small optimization but it adds up. However, it can make programs harder to debug and lifetimes shorter than you'd expect, so the compiler does not always take this optimization. – Eric Lippert Oct 15 '19 at 16:42
  • 1
    @LucaCremonesi: It is unfortunate that the C# compiler team chose "ephemeral" to refer to the shortest lived of the short-lived variables at the same time that the GC team chose it to mean "the shortest-lived of the long-lived variables". They have nothing to do with each other except that in both cases we're referring to storage having a shorter lifetime than you might otherwise expect. – Eric Lippert Oct 15 '19 at 16:44
  • 1
    @EricLippert Thanks for your reply and giving me the correct terms like dereference :-) Also very interesting discussion with Luca :-) – stefan.at.kotlin Oct 16 '19 at 18:58
0

Each instance of a class or struct has a "personal memory space" for data, but methods are shared once for all objects.

First, you need 4 bytes on x32 or 8 bytes on x64 to store the reference to the memory address of the object (reference is a hidden pointer to forget to manage it).

Next, the object has two data members here:

  • One integer that takes 4 bytes.
  • One string that here takes 5 chars : 5x2 bytes = 10 bytes.

So for data the object takes 18 bytes on x32 or 22 bytes on x64 system.

Since string object contains an integer for the length, the size is a little more than that : 22 on x32 and 26 on x64.

Since string is a reference, we need to add again 4 or 8 bytes => 26 or 34 bytes.

Since string has some other static and instance fields in the class declaration like the first char, it takes a little more than this.

Is string actually an array of chars or does it just have an indexer?

In addition to that, the memory in the code segment has the instructions of the code of methods. This code is common for all instances.

In addition to that, there are class tables and virtual tables to describe types, methods signatures and polymorphism rules.

If the object is instantiated in a method it uses the heap memory.

If the object is instantiated in the declaration as a class member, I don't know how works .NET but it may be allocated in the data segment of the processus.

And memory is like a train where wagoons are the bytes.

Here is a pseudo-diagram of the memory.

It is not the very true reality but it may help to understand:

enter image description here

Does accessing a variable in C# class reads the entire class from memory?

C# Heap(ing) Vs Stack(ing) In .NET

A byte is the elementary unit of the memory that stores one value at a time between 0 and 255 (unsigned) or -128 and +127 (signed).

Learn the basics about C# data types' variables

Shifting Behavior for Signed Integers

A Tutorial on Data Representation


Seeing this sketch today (2021.01.28) I realize it may be mlisleading, and it is why I wrote "It is not the very true reality but it may help to understand", because in reality the code of the implementation of methods are loaded from the binary files EXE and DLLs when the process is starded and is stored is the CODE SEGMENT, as all data, static (literals) and dynamic (instances) are in the DATA SEGMENT (if things had not changed since x32 and protected mode). Methods' non-virtual tables as well as methods' virtual tables are not stored in the data segment for each instance of objects. I don't remember details but these tables is for code. Also Data of each instance of an object is a projection from its definition as well as its ancestors, in one place, one full instance.

Memory segmentation

x86 memory segmentation

  • 3
    That’s not exactly true. Objects also have an [object header and a method table](https://devblogs.microsoft.com/premier-developer/managed-object-internals-part-1-layout/). Strings are reference types, so there would be a reference to the actual string object. Also strings have a length information. The actual MyClass object would just be the object header, reference to method table, an integer and a reference to a string object. – ckuri Oct 14 '19 at 19:19
  • 2
    Strings don't have references to char arrays - the text data is directly within the string object. – Jon Skeet Oct 14 '19 at 19:36
  • @JonSkeet, ckuri, Olivier Rogier: Tried a high level visualization, added to my original question. Can you guys check it and maybe also have a look at the two new questions? :-) – stefan.at.kotlin Oct 14 '19 at 21:21
  • 1
    @OlivierRogier thanks! accepted as answer due to all that involved work and much content, extra bonus for the visuals ;-) – stefan.at.kotlin Oct 16 '19 at 18:57