6

I was trying to better understand string's interning in c# and got into the following situation:

string a ="Hello";
string b ="Hello";
string c = new string(new char[]{'H','e','l','l','o'});
string d = String.Intern(c);
Console.WriteLine(a==b);
Console.WriteLine(c==d);
Console.WriteLine((object)a==(object)b);
Console.WriteLine((object)c==(object)d);

I'm getting the following result in console:

True
True
True
False

The mistake for me is that why is the 4th false ?

Samvel Petrosov
  • 7,580
  • 2
  • 22
  • 46
  • `Console.WriteLine((object)c==(object)d);` => here `==` checks for reference equality. Use `(object)c.Equals((object)d);` and see if it returns `True`. – Tetsuya Yamamoto Jul 13 '17 at 07:08
  • 2
    @TetsuyaYamamoto: While that is in a sense correct, I think the OP is aware of that. (These sort of edge cases epitomise my reasons for preferring good old `std::string` from C++.) – Bathsheba Jul 13 '17 at 07:09
  • @TetsuyaYamamoto yes, that return true. But `==` in case of objects also checks the references. And I don't understand why it returns `false`. – Samvel Petrosov Jul 13 '17 at 07:10
  • It's explained in the documentation of `String.Intern`https://msdn.microsoft.com/en-us/library/system.string.intern(v=vs.110).aspx – Fabiano Jul 13 '17 at 07:11
  • Right, OP may aware of the `Equals` method here. But `String.Intern` itself retrieves system reference to specified string, which depends on how intern pool works. – Tetsuya Yamamoto Jul 13 '17 at 07:11

2 Answers2

4

If you had not created a (and b), then Console.WriteLine((object)c==(object)d); would have resulted in True.

However, at the time that you do string d = String.Intern(c); the string "Hello" already exists in the string intern pool, due to a, so the call to intern c finds the already existing "Hello" and returns it.

So, if "Hello" had not already been interned, then the "Hello" of c would have been interned, in which case the returned d would have been equal to c.

Proof: if you do Console.WriteLine(b==d); it should returnTrue`. (I bet it will.)

Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • Even if I remove `a` and `b` anyway `Console.WriteLine((object)c==(object)d);` returns `False` – Samvel Petrosov Jul 13 '17 at 07:14
  • @SamvelPetrosov b==d returns true but the reason must be different than you described because if I change a="Hello1" and b="Hello1" then c==d still returns false – MistyK Jul 13 '17 at 07:16
  • @SamvelPetrosov - I ran the code and removed `a` & `b` and it returned `True` for me. – Enigmativity Jul 13 '17 at 07:19
  • @MistyK I wonder why `Console.WriteLine(c==d);` returns `True` and `Console.WriteLine((object)c==(object)d);` and returns False, while `==` is checking references and if they were equal before casting why they got different and why the same is not beeing in case with `a` and `b` – Samvel Petrosov Jul 13 '17 at 07:20
  • @SamvelPetrosov If I remove `a` and `b`, it returns `True`. I bet you should have the same result base on my understanding of intern and inter pool. – Thang Pham Jul 13 '17 at 07:21
  • @Enigmativity https://dotnetfiddle.net/0UM6tQ look results second line `False` – Samvel Petrosov Jul 13 '17 at 07:22
  • 3
    @SamvelPetrosov the difference is that c==d will use string overload to compare strings (it's normal string comparison) and (object)c == (object)d will use reference comparison always. I can't edit the previous post but I meant (object)c == (object)d returns false – MistyK Jul 13 '17 at 07:22
  • @MistyK ok, take look at this https://dotnetfiddle.net/0UM6tQ – Samvel Petrosov Jul 13 '17 at 07:26
  • @SamvelPetrosov - That's not how it runs when I do it locally. – Enigmativity Jul 13 '17 at 07:28
  • @SamvelPetrosov yep, everything as expected and the same question remains - why doesn't String.Intern use the same reference here? I don't know the answer here. It just proves that Mike's explanation isn't valid. – MistyK Jul 13 '17 at 07:29
  • @MistyK I am a little confused with this `True`,`False`s now. I have locally run this same code and got 4 `True` as a result – Samvel Petrosov Jul 13 '17 at 07:30
  • @MistyK http://take.ms/hoK81 – Samvel Petrosov Jul 13 '17 at 07:31
1

The documentation says that the method return

The system's reference to str, if it is interned; otherwise, a new reference to a string with the value of str.

and in the remarks:

if you assign the same literal string to several variables, the runtime retrieves the same reference to the literal string from the intern pool and assigns it to each variable.

Apparently creating the string "Hello" out of a char array results not in the same literal string, and it seems not to end up in the pool. changing the c-line to string c = "Hello" results in the output of True

Mong Zhu
  • 23,309
  • 10
  • 44
  • 76
  • No wonder that changing c ="Hello" will output true. String will be interned by CLR. The question is why manual intern is different than CLR intern? – MistyK Jul 13 '17 at 07:17
  • 1
    @MistyK - Why do you think it is different? It seems consistent to me. – Enigmativity Jul 13 '17 at 07:29
  • @Enigmativity if you have two strings - a="Hello" and b="Hello" which are compile time constant then they will be both pointing to the same instance. However if you have a=new string(new char[]{'H','e','l','l','o'}) and if you intern it like this: string b = String.Intern(a) they don't point to the same instance. – MistyK Jul 13 '17 at 07:32
  • @MistyK - They can't because it would break the run-time. The constructed string is on the heap and can't be moved otherwise the reference to an existing string would be moved. The interned string can be moved - and it must be to become interned. So you've got one reference that can't move and one that must. They both then can't be the same reference. – Enigmativity Jul 13 '17 at 07:38
  • @Enigmativity yup, you are right. I think Peter Duniho explained it pretty well in the comment. – MistyK Jul 13 '17 at 07:52