0

We have a legacy system written in .NET, which we are migrating to Node.js.

The original system uses ("some string value").GetHashCode() to generate some tokens based on user data.

I'm looking for a way to implement this function in JavaScript in order to port this part of the system to Node.js.

Therefore, I'm interested in how String.GetHashCode() actually works. Is there an algorithm documented somewhere? Is it even a stable algorithm or does it change across various .NET versions?

I've tried to find some details on it's implementation, but it's really difficult for me, because .NET is not my primary technology and I'm not really familiar with it's resources and sources of information.

Slava Fomin II
  • 26,865
  • 29
  • 124
  • 202
  • 1
    Hashcodes in .NET are not stable across versions. [Here is one implementation though, from .net core.](https://github.com/Microsoft/referencesource/blob/master/mscorlib/system/string.cs) – Bradley Uffner Sep 06 '17 at 18:39
  • Does it matter if it's stable across versions? It just generates the same value during the runtime of a program, no? – Icepickle Sep 06 '17 at 18:39
  • From comments in the [source code](https://referencesource.microsoft.com/#mscorlib/system/string.cs), not only is it not stable across .net versions, it may not even be stable between AppDomains within the same process. – Bradley Uffner Sep 06 '17 at 18:41
  • are you sure it was System.GetHashCode ? System is a namespace. GetHashCode doesn't take any parameters – Paweł Łukasik Sep 06 '17 at 18:42
  • The problem is that previous developers were using this function to generate tokens, which were then persisted and distributed through various systems. Also, is it possible to reverse the result of this function (they claim, that they were doing it), or is it a one-way function? – Slava Fomin II Sep 06 '17 at 18:43
  • Do you mean object.GetHashCode()? also "generate some tokens based on user date" sounds strange. Hash code are hash codes (mostly to be able to store in hashtable, list, collections, etc.), not anything else (not object id, not unique stuff, etc..) – Simon Mourier Sep 06 '17 at 18:43
  • @SlavaFominII - "which were then persisted and distributed through various systems" That wasn't a very good idea. [MSDN docs](https://msdn.microsoft.com/en-us/library/system.string.gethashcode(v=vs.110).aspx) "As a result, hash codes should never be used outside of the application domain in which they were created, they should never be used as key fields in a collection, and **they should never be persisted**. " – hatchet - done with SOverflow Sep 06 '17 at 18:45
  • Looks like it was `String.GetHashCode()` after all. I'm not going to even discuss they professionalism. This problem is just a tip of the iceberg = ) – Slava Fomin II Sep 06 '17 at 18:47

2 Answers2

2

To add on to Bradley's answer This is a stable hash code based off of the 64 bit implmentation of String.GetHashCode() that uses no unsafe code that I wrote up for a answer a while ago.

public static class StringExtensionMethods
{
    public static int GetStableHashCode(this string str)
    {
        unchecked
        {
            int hash1 = 5381;
            int hash2 = hash1;

            for(int i = 0; i < str.Length && str[i] != '\0'; i += 2)
            {
                hash1 = ((hash1 << 5) + hash1) ^ str[i];
                if (i == str.Length - 1 || str[i+1] == '\0')
                    break;
                hash2 = ((hash2 << 5) + hash2) ^ str[i+1];
            }

            return hash1 + (hash2*1566083941);
        }
    }
}
Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
1

Taken from Microsoft's Reference Source, one implementation is:

        // Gets a hash code for this string.  If strings A and B are such that A.Equals(B), then
        // they will return the same hash code.
        [System.Security.SecuritySafeCritical]  // auto-generated
        [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
        public override int GetHashCode() {

#if FEATURE_RANDOMIZED_STRING_HASHING
            if(HashHelpers.s_UseRandomizedStringHashing)
            {
                return InternalMarvin32HashString(this, this.Length, 0);
            }
#endif // FEATURE_RANDOMIZED_STRING_HASHING

            unsafe {
                fixed (char *src = this) {
                    Contract.Assert(src[this.Length] == '\0', "src[this.Length] == '\\0'");
                    Contract.Assert( ((int)src)%4 == 0, "Managed string should start at 4 bytes boundary");

#if WIN32
                    int hash1 = (5381<<16) + 5381;
#else
                    int hash1 = 5381;
#endif
                    int hash2 = hash1;

#if WIN32
                    // 32 bit machines.
                    int* pint = (int *)src;
                    int len = this.Length;
                    while (len > 2)
                    {
                        hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
                        hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ pint[1];
                        pint += 2;
                        len  -= 4;
                    }

                    if (len > 0)
                    {
                        hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
                    }
#else
                    int     c;
                    char *s = src;
                    while ((c = s[0]) != 0) {
                        hash1 = ((hash1 << 5) + hash1) ^ c;
                        c = s[1];
                        if (c == 0)
                            break;
                        hash2 = ((hash2 << 5) + hash2) ^ c;
                        s += 2;
                    }
#endif
#if DEBUG
                    // We want to ensure we can change our hash function daily.
                    // This is perfectly fine as long as you don't persist the
                    // value from GetHashCode to disk or count on String A 
                    // hashing before string B.  Those are bugs in your code.
                    hash1 ^= ThisAssembly.DailyBuildNumber;
#endif
                    return hash1 + (hash2 * 1566083941);
                }
            }
        }

This is not stable across .NET versions, and from comments scattered around the string.cs source code, it may not even be stable across AppDomains within the same process.

If you want a real, stable hash code, that can "safely" be persisted outside the AppDomain, look at the hash functions in System.Security.Cryptography. MD5 is acceptable for low security jobs, the SHAx flavors are even better.

True Hashes are one way only, it is not possible to truly reverse a hash, since it is a "lossy" process. If the developers you got your code from claim they can reverse a hash, they were either lying, mistaken, or didn't implement the correct hash.

Bradley Uffner
  • 16,641
  • 3
  • 39
  • 76
  • Thank's a lot for posting this. I totally agree about the reversibility of the hash, that claim was looking weird to me from the beginning. I was thinking maybe it's some implementation feature in .NET. – Slava Fomin II Sep 06 '17 at 18:51
  • If you want a real, stable hash code, that can "safely" be persisted outside the AppDomain, look at the hash functions in `System.Security.Cryptograph`. MD5 is acceptable for low security jobs, the SHAx flavors are even better. – Bradley Uffner Sep 06 '17 at 18:54
  • I'm not coding in .NET, I'm just porting a service already written in it, but thanks for a hint. – Slava Fomin II Sep 06 '17 at 18:57
  • Also, is it possible to obtain reference source for specific version of .NET? I think they were using version `4.0`. – Slava Fomin II Sep 06 '17 at 18:58
  • I don't see any way to switch versions on the Reference Source site, it seems to be showing 4.7 right now. – Bradley Uffner Sep 06 '17 at 18:59
  • 1
    @BradleyUffner I think older versions might only be available for offline access only. see http://referencesource.microsoft.com/download.html (or click the download link at the top of the normal site to get a prettier iframed version) – Scott Chamberlain Sep 06 '17 at 19:00
  • Or to put it another way: If you can calculate from an integer hashcode the input string back it would be a record breaking compression algorithm. You can do this if the input string has a length of 4 bytes. but storing more data in an integer is an illusion. – Alois Kraus Sep 06 '17 at 19:20
  • Something, something, something, middle out compression. – Bradley Uffner Sep 06 '17 at 19:23