1

I and friend of mine were discussing about strings in Dotnet framework, how they are reference type but act like value type (immutable). We both knew that strings are internal to CLR, but we did not really come to conclusion in that short discussion, how really strings are created and managed by CLR/Framework.

For example, in the below code clearly the s1 and s2 are different instances, but as you can see when I did s2.ToUpper() the result refer back to the s1.

    public static void Main (string[] args)
    {
        string s1 = "HELLO";
        string s2 = "hello";

        Console.WriteLine (s1.GetHashCode()); //Prints 68624562
        Console.WriteLine (s2.GetHashCode()); //Prints 99162322
        Console.WriteLine (s2.ToUpper().GetHashCode()); //Prints 68624562 too!
    }

So, the questions is on calling s2.ToUpper() did CLR created new string "HELLO" and check it already existed, if so then throw away newly created string? Can someone explain the magic here?

Prashant Cholachagudda
  • 13,012
  • 23
  • 97
  • 162
  • 1
    Different string instances can (and will) return the same hash code (i.e. your code doesn't show that they are different instances). – George Duckett Apr 08 '12 at 17:45
  • 1
    I'm not very much into this, but I could imagine that CLR simply executes ToUpper(), the return value stores temporarily on a stack (?) and on it applies GetHashCode(). If you ask about the hash itself, it's all OK - GetHashCode() is generated from *content* of the string/another object. So hash of two equal strings is the same. It serves for equality testing. – Miroslav Mares Apr 08 '12 at 17:48
  • Strings, as well as other immutable reference types, **do not act like value types**. Being immutable only takes away a few of the things you can do with mutable reference types (primarily, observing that they are the same object by mutating it in one place and checking for the change in another), but there are still plenty of differences. Assuming `var arr = T[] {obj, obj}`, try out `ReferenceEquals(arr[0], arr[1])` with value and reference type of your choice. –  Apr 08 '12 at 18:45

5 Answers5

3

String.GetHashCode() generates a hash value that's based on the content of the string. So it is entirely natural that the same string generates the same hash. Implied is that you cannot conclude that the string reference returned by ToUpper() has to match the s1 reference. And it doesn't, that would be far too expensive to implement.

You can verify that by testing this code:

    static void Main(string[] args) {
        var s1 = "hello";
        var s2 = "HELLO";
        var s3 = s1.ToUpper();
        bool eq = object.ReferenceEquals(s2, s3);
        System.Diagnostics.Debug.Assert(!eq);
    } 
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
3

It's not a surprise two GetHashCode() calls give the same result for same inputs, that's the point of hashing...

On the contrary, when you do:

Console.WriteLine(Object.ReferenceEquals(s2.ToUpper(), s1));

It just returns false. Hence you really have two string instances living, both with the same content.

I think you need to brush up your knowledge about hashing, hashcode and equality.

Or are you coming from Java? Maybe you got the impression hash code was related to object reference values because Object.getHashCode()'s documentation states:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

Gregory Pakosz
  • 69,011
  • 20
  • 139
  • 164
2

You cannot use GetHashCode() to uniquely identify instances. The hash code has to be the same for two different objects having the same value. Otherwise it wouldn't work as a hash code.

Anders Abel
  • 67,989
  • 17
  • 150
  • 217
1

s2.ToUpper() is just method call that doesn't change value of s2 object (s2 is object of type String). It takes value of s2 and returns new instance of String class with "HELLO" value (result of ToUpper() method). In the scope of Main function there are still two objects s1 and s2 and their values remain unchanged.

Milan Svitlica
  • 645
  • 4
  • 9
0

To add, answer your other part...

If you also check this Object.ReferenceEquals(s2.ToUpper(), s2) you'll see that it's false too.

Strings are immutable - meaning in this case that ToUpper() returns a new instance.

So the answer is yes, the "HELLO" is the new string.

But then, as others said, GetHashCode() is just a 'hash value' - it mostly serves to have a diverse algorithm to 'fill buckets' when dealing with hashes and dictionaries.

Or see this link What is the best algorithm for an overridden System.Object.GetHashCode? - and the answer - gives you a good sense how hash algorithm works - and why it's not unique - and why it might be the same for strings that are of the same content.

Or this one http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/

Community
  • 1
  • 1
NSGaga-mostly-inactive
  • 14,052
  • 3
  • 41
  • 51