Strings in Structs

Question

Strings are reference types so if I use them in structs just the reference will be stored in the stack. But why this code gives me foo1.Bar different from foo2.Bar?

var foo1 = new Foo();
foo1.Bar = "test";

var foo2 = foo1;
foo2.Bar = "test2";

Console.WriteLine($"foo1 -> {foo1.Bar}");
Console.WriteLine($"foo2 -> {foo2.Bar}");

struct Foo
{
    public string Bar;
}

Shouldn't foo1.Bar and foo2.Bar store the same reference and in this case show the same result at the end?

I'm using .NET 6, C# 10.0

`Shouldn't foo1.Bar and foo2.Bar store the same reference` no — Iłya Bursov, Sep 15 '22 at 01:30
You should read this: https://stackoverflow.com/a/52428042/3043 — Joel Coehoorn, Sep 15 '22 at 02:15

Joel Coehoorn · Accepted Answer · 2022-09-15T13:32:00.337

Let's work through the first four lines one at a time:

var foo1 = new Foo();

Above creates a new Foo instance contained entirely in the variable foo1, wherever foo1 may happen to be (we usually think in terms of the stack, but there are exceptions).

foo1.Bar = "test";

Now a string object is created in heap memory, and the reference is assigned to the Bar property of the foo1 variable. The C# compiler gives strings some special treatment so they have some value type semantics (ie: you didn't have to write foo1.Bar = new string("test");), but they are actually reference types, so only the reference is assigned to foo1.Bar.

var foo2 = foo1;

This copies the Foo instance in foo1 to a new foo2 variable. Because this is a struct, and not a class, the contents of foo1 are copied to a new object contained entirely in foo2, still presumably on the stack. If the Foo type were a class instead of a struct, then foo2 would only receive a reference to the same object as foo1, but as it is these two variables are now completely different objects. The .Bar reference is also copied as part of this, but only the reference. Therefore you now have two different references that refer to the same "test" string object.

foo2.Bar = "test2";

A new "test2" string object is created in heap memory, with a reference assigned to the foo2.Bar property. This replaces the old reference. However, this reference is not the same as foo1.Bar — they formerly referred to the same object, but were different references — and therefore foo1 is unchanged. Again, if Foo were a class instead of a struct, then the the foo1 and foo2 variables at this point would hold reference values for the same object, in which case updating foo2.Bar would also update foo1.Bar, but as a struct we ended up with copies instead, and the copies are free to diverge.

Some additional reading in this area:

https://stackoverflow.com/a/52428042/3043
https://learn.microsoft.com/en-us/archive/blogs/ericlippert/the-truth-about-value-types

(Temp comment) Do you think it's worth tightening the part about `foo2.Bar` which is likely a property, not a variable (with an instance variable behind it)? (We are trying to be extra precise in the answers). — tymtam, Sep 15 '22 at 03:01

score 3 · Answer 2 · answered Sep 15 '22 at 01:38

3

When you create an instance of Foo you are (for all intents and purposes) creating an instance on the stack.

When you assign one Foo to another you are also creating a new instance of Foo because Foo is a struct.

Foo only contains a reference to a string. So when you assign one Foo to another you are making of a copy of that string reference. When you then assign a different string to the copy of Foo you are only assigning the string to the copy. The original Foo is untouched. Hence the original Foo retains its original string reference.

answered Sep 15 '22 at 01:38

Enigmativity

113,464
11
89
172

1

I would also add that what happens with `Bar` between two copies of the struct is not a special case or anything. It's exactly what happens if we're dealing with two string variables instead of a string field in two structs. For example `string bar1 = "test"; string bar2 = bar1; bar2 = "test2"; Console.WriteLine(bar1); // "test"`. – 41686d6564 stands w. Palestine Sep 15 '22 at 01:52
"you are making of a copy of that string reference" - this means that both string references must be pointing to the same string in memory, which is not true. This actually makes a copy of the string and not just the reference. – Shameel Sep 15 '22 at 02:15
1

@Shameel - No, they are pointing to the same string in memory at the point that the `struct` is copied. – Enigmativity Sep 15 '22 at 02:33
@Shameel - Your understanding of what's happening is wrong. – Enigmativity Sep 15 '22 at 02:34
@Shameel You're incorrect, my friend. [Here's a simple way to test this](https://rextester.com/DHGZG93746). Compare the results to what you get after replacing `string copy = s;` with `string copy = new string('0', 10_000_000);`. [Here](https://rextester.com/DOHO82487). – 41686d6564 stands w. Palestine Sep 15 '22 at 02:36
It think making this about strings just makes things more complicated. What we see here is not specific to strings. – tymtam Sep 15 '22 at 02:41
@Enigmativity if the reference is pointing to the same string in memory at the point that the struct is copied, then any string assignment to the reference would actually change the original string and won't create a copy and make the reference point to the new string. – Shameel Sep 15 '22 at 02:49
@Shameel - "would actually change the original string" - no, it doesn't. You have that wrong. – Enigmativity Sep 15 '22 at 02:52
@Shameel - Your understanding of how strings work in .NET is not correct. – Enigmativity Sep 15 '22 at 02:54

tymtam · Answer 3 · 2022-09-15T02:50:31.297

After

foo2.Bar = "test2";

foo2.Bar points to a different string. The assignment changes which object the reference points to (as opposed to making changes to the object that the reference points to):

var foo1 = new Foo();
foo1.Bar = "test";
//
//                               "test" 
// foo1.Bar  ---------------------┘                  


var foo2 = foo1;
//
//                               "test" 
// foo1.Bar  ---------------------┘  |                
//                                   |
// foo2.Bar  ------------------------┘                  

foo2.Bar = "test2";

//
//                               "test" 
// foo1.Bar  ---------------------┘                 
//                                
//                               "test2"
// foo2.Bar  ---------------------┘

This is not specific to strings. Here's an example with a list (inspired by Value types (C# reference)):

A a1 = new A() { L = new List<string> {"1", "11" } };

A a2 = a1; // Shallow copy


Console.WriteLine(a1);  // [1,11]
Console.WriteLine(a2);  // [1,11]

a2.L.Add("X");

Console.WriteLine(a1);  // [1,11,X]
Console.WriteLine(a2);  // [1,11,X]

// this does not make changes to the object that a2.L points to, 
// it changes which object a2.L points to.
a2.L = new List<string> {"2", "22" };                                  

Console.WriteLine(a1); // [1,11,X]
Console.WriteLine(a2); // [2,22]

public struct A
{
    public List<string> L {get; set; }

    public override string ToString() => $"[{string.Join(",", L)}]";
}

Please note that your question is NOT about immutability of strings. This is because nowhere in your code the strings are modified.

From Strings and string literals:

Because a string "modification" is actually a new string creation, you must use caution when you create references to strings. If you create a reference to a string, and then "modify" the original string, the reference will continue to point to the original object instead of the new object that was created when the string was modified. The following code illustrates this behavior:
string str1 = "Hello ";
string str2 = str1;
str1 += "World";

System.Console.WriteLine(str2);
//Output: Hello

^ This is about changing strings and your example does not change strings.

Reference for value type assignment:

From Value types (C# reference):

By default, on assignment, passing an argument to a method, and returning a method result, variable values are copied.

and

If a value type contains a data member of a reference type, only the reference to the instance of the reference type is copied when a value-type instance is copied. Both the copy and original value-type instance have access to the same reference-type instance.

score -1 · Answer 4 · answered Sep 15 '22 at 02:22

-1

Here is what is happening behind the scenes

 var foo1 = new Foo();

Allocate some memory and assign foo1 to point to it. Lets call that location A So foo1 -> mem(a)

 foo1.Bar = "test";

set the data at mem(a) offset Bar to point at "test"

 var foo2 = foo1;

Assignment of an object creates a "copy" of the object. So a new memory location ---- mem(b) is created and foo2 is assigned to point to it.

 foo2.Bar = "test2";

set the data at mem(b) offset Bar to point at "test2"

Ok so when you print out foo1 and foo2 they still point to mem(a) and mem(b).

NOTE! This is very different with C and C++ where you actually change the pointers themselves. In those language (which are not memory managed) you can have two variables and structs that with variables that point to the location.

answered Sep 15 '22 at 02:22

Hogan

69,564
10
76
117

I don't think 'Assignment of an object creates a "copy" of the object.' is clear or true. – tymtam Sep 15 '22 at 02:38
Some objects are reference types and some value types. Assignment works differently for each. – Enigmativity Sep 15 '22 at 02:40
@tymtam -- you want me to change it to say shallow copy? Is that clearer to someone who does not know what a shallow copy is? – Hogan Sep 15 '22 at 02:40

Shameel · Answer 5 · 2022-09-16T02:13:23.327

-2

EDIT: Answer rephrased to remove ambiguity and removed reference to compile time string optimization that is not relevant to this question. Refer to Joel Coehoorn's answer for the most accurate description of what happens behind the scenes.

This is really an interesting phenomenon. The = operator does a shallow copy of a struct and your expectation that the reference to the string should have been maintained is understandable.

But your expectation holds good for other primitive types, but not string. Remember, String derives from System.Object and not from System.ValueType. System.String is one of the few classes in the .NET Framework Base Class Library that is given special treatment by the CLR. It behaves much like a value type at runtime. String copy and compare operations result in value semantics rather than reference semantics.

So when you assign a struct to another with the = operator, the shallow copy creates a copy of the string reference that refers to the same string in memory. At this stage, there is only one string in the heap and both structs refer to the same string. In contrast, a deep copy will create a copy of the reference as well a copy of the string. The Struct.Clone() method does a deep copy.

When you assign a new string to the second struct's Bar variable, it now refers to the new string. The first struct's Bar continues to refer to the original string.

edited Sep 16 '22 at 02:13

answered Sep 15 '22 at 01:43

Shameel

632
5
12

That is really interesting!!! I had no idea that it happened. Thanks! – Milton Sampaio Sep 15 '22 at 01:50
3

"So when you assign a struct to another with the = operator, the shallow copy creates a copy of the string instead of referring to the same string in memory." - this isn't true at all. It's the opposite of what happens. – Enigmativity Sep 15 '22 at 02:05
3

"If the value being assigned to a string variable is the same as one of the strings already in the intern pool" - is misleading. This is nothing to do with assignment. It's only to do with string literals at compile-time. – Enigmativity Sep 15 '22 at 02:06
@Enigmativity You are right, I have updated my answer to remove ambiguity. – Shameel Sep 16 '22 at 02:14
@Shameel - Where did you get the `Struct.Clone()` method from? And I don't think a "deep copy" of a struct would bother making a copy of the actual string. I think that's just bogus. – Enigmativity Sep 16 '22 at 02:23

Strings in Structs

5 Answers5