0

I've seen similar questions and answers but I couldn't find the one I'm thinking of. How is this possible that CLR somehow knows which string is the same and which is not and makes the same object if I write down the same string value but no show it the object explicitly?

class Program
{
    static void Main()
    {
        String test = "test";
        String test2 = "test";
        String test3 = test;
        String test4 = String.Copy(test);

        Console.WriteLine(Object.ReferenceEquals(test, test2));
        Console.WriteLine(Object.ReferenceEquals(test, test3));
        Console.WriteLine(Object.ReferenceEquals(test, test4));

        Console.ReadLine();
    }
}

The output of this code will be:

True
True
False

Why is it allocating strings "test" and "test2" in the same place? If the string would be "dsfadsfdsafdasfsadfasfdagfgfafadsf" or even longer, I wonder is it efficient to compare all these strings or is it made in other way?

kubwosz
  • 121
  • 1
  • 8
  • See also https://stackoverflow.com/questions/22290576/object-referenceequals-returns-true-for-matching-strings – gunr2171 May 05 '22 at 22:05

2 Answers2

1

Efficiency

You can see the code first does a reference comparison. Because this is a very fast way to determine equality. So yes, comparing strings with reference equality is faster. But why is this only true sometimes?

String literals

They are interned and it's important to understand what that means. See some useful documentation here and the official documentation here.

In short, the compiler already knows that string metadata (the literals) are the same, so why store them twice? It doesn't. Waste of memory and performance like comparison. So literals are interned. This is not necessarily true outside of literals. However, you can test for this and you can explicitly intern.

You can see this behavior, and how to intern at runtime, below:

static void Main()
{
    string test = "test";
    string test2 = "test";
    string test3 = test;
    string test4 = String.Copy(test);
    string test5 = string.Intern(test4);

    Console.WriteLine(ReferenceEquals(test, test2));
    Console.WriteLine(ReferenceEquals(test, test3));
    Console.WriteLine(ReferenceEquals(test, test4));
    Console.WriteLine(ReferenceEquals(test, test5));
}

Output:

True

True

False

True

Solely using String.Copy is certainly not the only way you'll see this behavior. There's too many cases to list. You just found a pretty obvious one.

This is easily demonstrated:

static void Main()
{
    string test = "test";
    string test2 = new StringBuilder().Append("te").Append("st").ToString();
    string test3 = string.Intern(test2);

    Console.WriteLine(ReferenceEquals(test, test2));
    Console.WriteLine(ReferenceEquals(test, test3));
}

Output:

False

True

You can test if a string is interned with string.IsInterned.

Bottom line, just because string1 == string2 absolutely does not mean they are the same reference.

Zer0
  • 7,191
  • 1
  • 20
  • 34
-1

Strings in C# are inmutables. test, test2 and test3 are the same string, the same reference. Any "test" string that you use in your code refer to the same memory address.

With String.Copy you explicity create a new reference even having the same text. But by default, same string it's always a unique reference in all your code.

Internally, the text is stored as a read only char collection.

When you work with strings and, for example, modify one of them:

string s1 = "A";
string s2 = "B";
s2 = s1 + s2;

C# has 3 strings in memory: "A", "B" and "AB".

Is for this reason that when you work a lot with strings, you must use StringBuilder. Is faster and you avoid create lots of intermediate strings.

Victor
  • 2,313
  • 2
  • 5
  • 13
  • `StringBuilder` does not help with reference equality. "same string" has nothing to do with reference equality at all. `string test = "A"; string test2 = "a".ToUpper();`. Yes, those are the same string "A". They are not the same reference. – Zer0 May 05 '22 at 23:55
  • @Zer0 I didn't say anything about StringBuilder and references. It's only a note: strings are inmutable and making operations with strings is slower and consume more memory than using StringBuilder. What's wrong with that comment? – Victor May 06 '22 at 00:59
  • @Zer0 "By default, same string..." is my answer. And in my answer I commented String.Copy, that I said is "new reference", not the same reference. You give another sample like String.Copy using ToUpper. I think you can understand my answer, if you want. – Victor May 06 '22 at 00:59
  • "making operations with strings is slower and consume more memory than using StringBuilder". Incorrect. `string test = string1 + string2` is faster than a string builder by a huge margin. Go ahead and benchmark it. So is `string test = string1 + string2 + string3 + string4`. And no, these do not create more intermediate strings either. Compiles down to a single call to `string.Concat` in both cases. Both statements about string builder in your answer and comment are incorrect and not what it's there for. – Zer0 May 06 '22 at 01:58
  • "same string it's always a unique reference in all your code". Also incorrect. Only true of **interned** strings. Otherwise false. I think you should research this more. – Zer0 May 06 '22 at 02:01
  • @Zer0 you only read what you want to read. What about "when you work a lot with strings..."? I think that a simple concatenation is not the case – Victor May 06 '22 at 06:03
  • @Zer0 and the same for "same string". I wrote "by default" refering to text that you write at compile time. Maybe my text a bit confuse (I'm Spanish) but when someone doesn't want to understand... – Victor May 06 '22 at 06:05