7

Given a set of integers, a "functional group", is there a better way to GetHashCode of the integers where the positions of the numbers doesn't affect the hash?

Sample code: https://dotnetfiddle.net/XoIN19#

static public void Main()
{
    int[] ints = { 10001, 10002, 10003, 10004, 10005 };
    int[] intsX = ints.Reverse().ToArray();

    int hash1;
    int hash2;
    
    Func<int[], int> GetHashCode = new Func<int[], int>(x => GetHashCodeFlawed(x));
    
    hash1 = GetHashCode(ints);
    Console.WriteLine("hash1={0}", hash1); //954101523
    
    hash2 = GetHashCode(intsX);
    Console.WriteLine("hash2={0}", hash2); //957855123
    
    Console.WriteLine("hash1==hash2 : {0}", hash1 == hash2);
}

static int GetHashCodeFlawed(IEnumerable<int> integers)
{
    IEnumerator<int> intEnum = integers.GetEnumerator();

    if(intEnum.MoveNext()==false) return 0;

    int hash = 0;
    unchecked {
        hash = intEnum.Current.GetHashCode();
        for(;intEnum.MoveNext()==true;)
            hash = 31 * hash + intEnum.Current.GetHashCode();
    }

    return hash;
}

Output of this is: hash=954101523 If I swap 10003 and 10002 i get: hash=954130353

Besides sorting the list before getting the hash, is there a better alternative that wont change if the items in the list positions change?

The list of integers basically represents a set of record Ids that are a "functional group", so the "functional group" is really the key, and not really dependent on the order

enorl76
  • 2,562
  • 1
  • 25
  • 38
  • 2
    It's an answer tagged as Java *shudders* but the concept should flow seamlessly. http://stackoverflow.com/questions/18021643 – Daniel Park Feb 04 '15 at 16:58
  • If nothing is order dependent, you could look into [HashSet](https://msdn.microsoft.com/en-us/library/bb359438.aspx) instead of custom defining your array. – ryanyuyu Feb 04 '15 at 17:55
  • Get hash of each integer and get sum of each hash. It's theoretically a little wrong but it'll work ;) – Mustafa Chelik Feb 04 '15 at 18:28

3 Answers3

5

Addition with a good one-value hash function

One good one-value hash function has a public domain implementation in C thanks to the Hash Function Prospector:

// exact bias: 0.020888578919738908
uint32_t
triple32(uint32_t x)
{
    x ^= x >> 17;
    x *= UINT32_C(0xed5ad4bb);
    x ^= x >> 11;
    x *= UINT32_C(0xac4c1b51);
    x ^= x >> 15;
    x *= UINT32_C(0x31848bab);
    x ^= x >> 14;
    return x;
}

You would convert that to C#, apply that to each value, and then sum all the hashed results together. Addition perfectly satisfies your 'order doesn't matter' criterion since order doesn't matter with addition, you still get the same result. The one-value hash function above satisfies your desire for a decent hash function.

Implementation

The following implements the idea above (with a test rearrangement to show it gives the same hash value):

using System;
using System.Collections.Generic;

public class Test
{
    static void Main()
    {
        int[] ints = { 10001, 10002, 10003, 10004, 10005 };
        int hash = GetHashCode(ints);
        int[] reorderedInts = { 10004, 10002, 10005, 10001, 10003 };
        int reorderedHash = GetHashCode(reorderedInts);

        Console.WriteLine("hash          == {0}", hash);
        Console.WriteLine("hashReordered == {0}", reorderedHash);
    }

    static int GetHashCode(IEnumerable<int> integers)
    {
        int hash = 0;

        foreach(int integer in integers)
        {
            int x = integer;

            x ^= x >> 17;
            x *= 830770091;   // 0xed5ad4bb
            x ^= x >> 11;
            x *= -1404298415; // 0xac4c1b51
            x ^= x >> 15;
            x *= 830770091;   // 0x31848bab
            x ^= x >> 14;

            hash += x;
        }

        return hash;
    }
}

This produces the output:

hash          == -2145263134
hashReordered == -2145263134
Chai T. Rex
  • 2,972
  • 1
  • 15
  • 33
  • I think this is too easy to collide? For example {1, 1, 2} and {2}, or {1, 1} and {2, 2} are producing the same hash. – 张实唯 Sep 18 '18 at 06:35
  • @张实唯 Thank you for your comment. I made it that way originally because the question asked about a hash for sets, which have no duplicates in them. I think your point is a good one, though, and I've made the method better for multisets by using addition to combine the individual hashes instead of XOR. [Here it is with your first example](https://ideone.com/8eNKbH#stdout). – Chai T. Rex Sep 18 '18 at 07:09
  • Thanks, though still feels not theoretically a good hash but using addition is good enough for my application. – 张实唯 Sep 18 '18 at 07:23
  • @张实唯 If you're interested, there are a few cryptographic multiset-collision-resistant multiset hashes in [this paper](https://people.csail.mit.edu/devadas/pubs/mhashes.pdf): `MSet-Add-Hash`, which requires a key, and `MSet-Mu-Hash`, which is slower but doesn't require a key. – Chai T. Rex Sep 23 '18 at 00:27
2

Instead of finding a permutation-invariant hash, I propose you first "un-permute" the list by finding a canonical permutation (e.g. sort the list first), then hashing that with whatever hash you desire.

Note that since this is integers we're talking about, you can use radix sort to do it in linear time.

user541686
  • 205,094
  • 128
  • 528
  • 886
0

Each iteration of the loop includes a multiply and an add operation. These are not commutative with each other. If you e.g. changed the addition to a multiplication then all operations would be mutually commutative, and the ordering of the list would not matter.

Though any list that contains a value that hashes to zero would have a hashcode of zero, so you might special case that value.

error
  • 694
  • 4
  • 14