Assuming you need speed without the overhead of the *Set
classes, then you could write H
as follows:
/**
* Hashes a set of integers.
*
* @param list to hash
* @return hash code
*/
public static int H(int list[]) {
// XOR all the integers together.
int hashcode = 0;
for (int val : list) {
hashcode ^= val;
}
return hashcode;
}
It is the same regardless of the order, and it is relatively efficient.
For example:
public static void main(String[] args) {
System.out.println(Integer.toHexString(H(new int[]{0xabcd,0x1234,0x1111})));
System.out.println(Integer.toHexString(H(new int[]{0x1234,0x1111,0xabcd})));
}
Displays:
a8e8
a8e8
This could be generalized to more than just int
s by doing the following:
/**
* Hashes a set of objects.
*
* @param list to hash
* @return hash code
*/
public static int H(Object list[]) {
// XOR all the hashes together.
int hashcode = 0;
for (Object val : list) {
hashcode ^= val.hashCode();
}
return hashcode;
}
The main
program would then have to use arrays of Integer
instead of the primitive int
.
Adding the numbers should be almost as quick, and might give you a better distribution over the 32 bit range. If the elements of the set are already uniformly distributed over the range, then xor might be better.
However, with both methods, you can manufacture collisions easily with integers. For example, with the adding method;
{1000, 1001, 1002}
{0, 1, 3002}
Both of these arrays have the same H()
.
With the XOR method;
{0x1010, 0x0101}
{0x1111, 0x0000}
Both of these have the same H()
.
Similarly, the 0
element is problematic since lists will have the same hash with or without it. You can mitigate this by adding a constant value at each iteration. For example:
...
hashcode += val.hashCode() + CONSTANT;
...
Or by including the number of elements as the initial hash code:
...
// XOR all the hashes together.
int hashcode = list.length;
...