18

Going deeper in C#, I have encountered a little (strange) problem with object reference equality. Let says I have two strings:

String a = "Hello world!";
String b = "Bonjour le monde";
bool equals = ReferenceEquals(a, b);  // ******************* (1)
b = "Hello world!";
equals = ReferenceEquals(a, b);       // ******************* (2)

(1) Is false and that is expected. ReferenceEquals Documentation says

ReferenceEquals compares instances

but then:

  • Why does (2) returns true?
  • Strings a and b are not the same object are they? If yes then how did they become the same given that I never explicitly did a=b
Renato Gama
  • 16,431
  • 12
  • 58
  • 92
GETah
  • 20,922
  • 7
  • 61
  • 103
  • 3
    There are some odd corner cases with string interning; if you're interested, see my article on the subject: http://blogs.msdn.com/b/ericlippert/archive/2009/09/28/string-interning-and-string-empty.aspx – Eric Lippert Nov 29 '11 at 20:32

4 Answers4

26

This is because of string interning.

The common language runtime conserves string storage by maintaining a table, called the intern pool, that contains a single reference to each unique literal string declared or created programmatically in your program. Consequently, an instance of a literal string with a particular value only exists once in the system.

For example, if you assign the same literal string to several variables, the runtime retrieves the same reference to the literal string from the intern pool and assigns it to each variable.

wsanville
  • 37,158
  • 8
  • 76
  • 101
10

String literals are automatically interned by the .NET runtime. This means that the same string instance is shared for string literals with the same value. This is done to reduce memory usage and improve performance. It is a safe optimization because strings are immutable.

Your code compiles to CIL instructions similar to the following:

IL_0001: ldstr "Hello world!"
IL_0006: stloc.0
IL_0007: ldstr "Bonjour le monde"
IL_000c: stloc.1
etc...

From the documentation of the ldstr ("load a literal string") instruction in the ECMA specification:

By default, the CLI guarantees that the result of two ldstr instructions referring to two metadata tokens that have the same sequence of characters, return precisely the same string object (a process known as "string interning"). This behavior can be controlled using the System.Runtime.CompilerServices.CompilationRelaxationsAttribute and the System.Runtime.CompilerServices.CompilationRelaxations.NoStringInterning.

You can also intern strings yourself by calling the method String.Intern.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
5

String literals are the same object most of the time, as they are constant and immutable.

Taken from microsoft docs:

Each string literal does not necessarily result in a new string instance. When two or more string literals that are equivalent according to the string equality operator (Section 7.9.7) appear in the same assembly, these string literals refer to the same string instance. For instance, the output produced by

class Test
{
   static void Main() {
      object a = "hello";
      object b = "hello";
      System.Console.WriteLine(a == b);
   }
}

is True because the two literals refer to the same string instance.

MByD
  • 135,866
  • 28
  • 264
  • 277
  • +1. Thanks for your answer. What do you mean by most of the time? Are there cases where they aren't the same? What happens to the literal "Bonjour le monde"? Will it be garbage collected? – GETah Nov 29 '11 at 20:13
  • most of the time - consider the case `b = new string("hello");` (which I think is allowed in C#) garbage collected - no, I think it will stay in the string pool. – MByD Nov 29 '11 at 20:15
3

.NET maintains a pool of strings since they are immutable. You dont have to care about it as the clr it self takes care to reuse them.

Renato Gama
  • 16,431
  • 12
  • 58
  • 92