3
var str1 = "C#";
var str2 = "F#";    
var str3 = "C#";

Console.WriteLine(Object.ReferenceEquals(str1, str2)); // False
Console.WriteLine(Object.ReferenceEquals(str1, str3)); // True <-- Why?

I know that strings are immutable reference types, which means that the objects can not be changed. But that doesn´t explain why the two strings refer to the same object.

Simon
  • 4,157
  • 2
  • 46
  • 87
  • 3
    C#'s compiler puts extra effort to intern strings it sees in user code. When they're loaded at runtime, they reference same string in dll – JL0PD Apr 26 '23 at 09:43
  • 1
    From spec: The Common Language Infrastructure (CLI) guarantees that the result of two `ldstr` instructions referring to two metadata tokens that have the same sequence of characters return precisely the same string object (a process known as "string interning"). – JL0PD Apr 26 '23 at 09:44
  • 1
    `"C#` and `"F#` in your code are compile-time constants whose references are assigned to the `str1`, `str2`, `str3` variables. Since strings are immutable, the compiler can create just one shared instance of each string (stored in the assembly itself) and pass references to it to all variables that need it. – Panagiotis Kanavos Apr 26 '23 at 09:50

2 Answers2

5

For literals (i.e. strings that appear in your code) and other constant string values (i.e. the result of a string operation that can be calculated entirely at compile-time), the compiler uses the ldstr operation which loads an interned (i.e. shared) instance of the string - on the basis that literals in your code typically get reused a lot and it would be suboptimal to constantly allocate new string instances. Hence they have the same reference. To quote from the ldstr documentation:

The Common Language Infrastructure (CLI) guarantees that the result of two ldstr instructions referring to two metadata tokens that have the same sequence of characters return precisely the same string object (a process known as "string interning").

Since string instances are immutable, this is fine (at least notionally; in reality people can always mutate immutable data via unsafe code etc, but that comes under the heading of "play stupid games, win stupid prizes").

If you had loaded those values from char[], or decoded from byte[], or as the result of string operations etc: they would be different strings that just happened to have the same contents.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
3

This is called string interning. C# compiler will intern string literals, i.e. leveraging string immutability it adds them to special storage in assembly where the same literals reference the same instance, so space is conserved. Consider the following example:

var str1 = "C#";
var str2 = "F#";    
var str3 = "C#";
var str4 = String.Concat("C", "#");

Console.WriteLine(Object.ReferenceEquals(str1, str2)); // False
Console.WriteLine(Object.ReferenceEquals(str1, str3)); // True 
Console.WriteLine(Object.ReferenceEquals(str1, str4)); // False

Since the str4 is constructed "dynamically" it will not be interned which will result in new string instance created, hence False (though str1.Equals(str4) is True)

See also:

Guru Stron
  • 102,774
  • 10
  • 95
  • 132