0

I have a piece of code generate a signature in C#, and for the sake of convenience, I used hashcode and that was fine.

however, my boss told me the signature now has to be generated in Java side too. this really drives me crazy. and i digged .net source code.

currently, i only need the hashcode of int, double, string and bool. int and bool are easy. the real thing that i can't think of an easy way is double and string. my environment will always be 64-bit. i have the source in the following:

for string:

        public override int GetHashCode() {

#if FEATURE_RANDOMIZED_STRING_HASHING
            if(HashHelpers.s_UseRandomizedStringHashing)
            {
                return InternalMarvin32HashString(this, this.Length, 0);
            }
#endif // FEATURE_RANDOMIZED_STRING_HASHING

            unsafe {
                fixed (char *src = this) {
                    Contract.Assert(src[this.Length] == '\0', "src[this.Length] == '\\0'");
                    Contract.Assert( ((int)src)%4 == 0, "Managed string should start at 4 bytes boundary");

#if WIN32
                    int hash1 = (5381<<16) + 5381;
#else
                    int hash1 = 5381;
#endif
                    int hash2 = hash1;

#if WIN32
                    // 32 bit machines.
                    int* pint = (int *)src;
                    int len = this.Length;
                    while (len > 2)
                    {
                        hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
                        hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ pint[1];
                        pint += 2;
                        len  -= 4;
                    }

                    if (len > 0)
                    {
                        hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
                    }
#else
                    int     c;
                    char *s = src;
                    while ((c = s[0]) != 0) {
                        hash1 = ((hash1 << 5) + hash1) ^ c;
                        c = s[1];
                        if (c == 0)
                            break;
                        hash2 = ((hash2 << 5) + hash2) ^ c;
                        s += 2;
                    }
#endif
#if DEBUG
                    // We want to ensure we can change our hash function daily.
                    // This is perfectly fine as long as you don't persist the
                    // value from GetHashCode to disk or count on String A 
                    // hashing before string B.  Those are bugs in your code.
                    hash1 ^= ThisAssembly.DailyBuildNumber;
#endif
                    return hash1 + (hash2 * 1566083941);
                }
            }
        }

i am not sure about FEATURE_RANDOMIZED_STRING_HASHING(i guess it's not set though), and the pointer casting here:

int* pint = (int *)src;

doesn't sound straightforward in java.

for double:

public unsafe override int GetHashCode() {
    double d = m_value;
    if (d == 0) {
        // Ensure that 0 and -0 have the same hash code
        return 0;
    }
    long value = *(long*)(&d);
    return unchecked((int)value) ^ ((int)(value >> 32));
}

the same issue. there is a pointer casting, reference and dereference.

how can i do that in java(no native code)?

Jason Hu
  • 6,239
  • 1
  • 20
  • 41
  • You're relying on features of the hash that are very explicitly stated as not existing. If you want to have a hash function that meets your requirements you'll need to write one from scratch so that you can be sure it has all of the properties you require (for example, being deterministic across process executions. – Servy Apr 29 '15 at 16:26
  • @HenkHolterman oh really?! i've tested on my desk at least and i think the string one should be content-based only. could you guide me to find out how can i customize my own hash code(hopefully it's just an extraction from the c# source, right?)? – Jason Hu Apr 29 '15 at 16:29

2 Answers2

0

I wonder if you aren't making it more complicated than it needs to be with the whole unsafe section and pointers. Why don't you start with a solution in java then port it back to C#.

I bet there are a bunch of solutions on the net for coming up with a hash in Java, and the port from java to C# should be trivial.

edit: In fact, I looked it up for you: Good Hash Function for Strings

Please don't assume that pointers are necessary for performance either--using pointers probably stops compiler optimizations causing your code to be slower than if you'd just used arrays/strings like the java solutions above.

In response to comment: If you want the same function between C# and Java you will need a solution that doesn't use pointers. That solution will probably perform as well or better anyway (because the compiler has more freedom when optimizing it) and will certainly be more readable so if you want to use this solution recode it to do it without pointers first then use it in both the C# and Java versions.

If you can't recode it in your primary language--C#--you certainly won't be able to do it in Java.

Maintain compatibility by having good unit test coverage, if you don't have enough unit tests now, write them before making any changes--If you test existing hash codes (You appear to be persisting them somewhere) then you might be able to write some c# tests that will test both the c# and Java hash codes which would also be good to prove that your current effort is successful.

Community
  • 1
  • 1
Bill K
  • 62,186
  • 18
  • 105
  • 157
0

I needed to implement the .NET String GetHashCode in Java because of a port of some code we were doing where there was data dependent on the .NET String GetHashCode. The solution below is probably naive and definitely not optimized, but I didn't need it to be -- it's called rarely. I tested it with the empty string, 1, 2, 3, 4, and 5 character strings, and non-ascii strings. It works for my use cases, but I make no guarantees.

import java.nio.charset.Charset;

public class NetHashCode {
    public static int getHashCode(String s) {
        int hash1 = (5381<<16) + 5381;
        int hash2 = hash1;
        byte[] bytes = s.getBytes(Charset.forName("UTF-16LE"));
        int numCharsRemaining = s.length();
        // 2 bytes per character, little endian.
        for(int j=0; j< bytes.length; j+=4) {
            int holdsUpToTwoChars;
            if(numCharsRemaining > 1) {
                holdsUpToTwoChars = bytes[j] + (bytes[j+1] << 8) + (bytes[j+2] << 16) + (bytes[j+3] << 24);
                numCharsRemaining -= 2;
            } else {
                holdsUpToTwoChars = bytes[j] + (bytes[j+1] << 8);
                numCharsRemaining -= 1;
            }
            if(j%8 < 4) {
                hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ holdsUpToTwoChars;
            } else {
                hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ holdsUpToTwoChars;
            }
        }
        return hash1 + (hash2 * 1566083941);
    }
}
cgroneman
  • 21
  • 3