You need a dedicated Key Field to guarantee uniqueness
.hashCode() isn't designed for what you are using it for
.hashCode()
is designed to give good general results in bucketing algorithms, which can tolerate minor collisions. It is not designed to provide a unique key. The default algorithm is a trade off of time and space and minor collisions, it isn't supposed to guarantee uniqueness.
Perfect Hash
What you need to implement is a perfect hash or some other unique key based on the contents of the object. This is possible within the boundries of an int
but I wouldn't use .hashCode()
for this representation. I would use an explicit key field on the object.
Unique Hashing
One way to use use SHA1
hashing that is built into the standard library which has an extremely low chance of collisions for small data sets. You don't have a huge combinational explosion in the values you posts to SHA1
will work.
You should be able to calculate a way to generate a minimal perfect hash
with the limited values that you are showing in your question.
A minimal perfect hash function is a perfect hash function that maps n
keys to n consecutive integers—usually [0..n−1] or [1..n]. A more
formal way of expressing this is: Let j and k be elements of some
finite set K. F is a minimal perfect hash function iff F(j) =F(k)
implies j=k (injectivity) and there exists an integer a such that the
range of F is a..a+|K|−1. It has been proved that a general purpose
minimal perfect hash scheme requires at least 1.44 bits/key.2 The
best currently known minimal perfect hashing schemes use around 2.6
bits/key.[3]
A minimal perfect hash function F is order preserving if keys are
given in some order a1, a2, ..., an and for any keys aj and ak, j
A minimal perfect hash function F is monotone if it preserves the
lexicographical order of the keys. In this case, the function value is
just the position of each key in the sorted ordering of all of the
keys. If the keys to be hashed are themselves stored in a sorted
array, it is possible to store a small number of additional bits per
key in a data structure that can be used to compute hash values
quickly.[6]
Solution
Note where it talks about a URL
it can be any byte[]
representation of any String
that you calculate from your object.
I usually override the toString()
method to make it generate something unique, and then feed that into the UUID.nameUUIDFromBytes()
method.
Type 3 UUID can be just as useful as well UUID.nameUUIDFromBytes()
Version 3 UUIDs use a scheme deriving a UUID via MD5 from a URL, a
fully qualified domain name, an object identifier, a distinguished
name (DN as used in Lightweight Directory Access Protocol), or on
names in unspecified namespaces. Version 3 UUIDs have the form
xxxxxxxx-xxxx-3xxx-yxxx-xxxxxxxxxxxx where x is any hexadecimal digit
and y is one of 8, 9, A, or B.
To determine the version 3 UUID of a given name, the UUID of the
namespace (e.g., 6ba7b810-9dad-11d1-80b4-00c04fd430c8 for a domain) is
transformed to a string of bytes corresponding to its hexadecimal
digits, concatenated with the input name, hashed with MD5 yielding 128
bits. Six bits are replaced by fixed values, four of these bits
indicate the version, 0011 for version 3. Finally, the fixed hash is
transformed back into the hexadecimal form with hyphens separating the
parts relevant in other UUID versions.
My preferred solution is Type 5 UUID ( SHA version of Type 3)
Version 5 UUIDs use a scheme with SHA-1 hashing; otherwise it is the
same idea as in version 3. RFC 4122 states that version 5 is preferred
over version 3 name based UUIDs, as MD5's security has been
compromised. Note that the 160 bit SHA-1 hash is truncated to 128 bits
to make the length work out. An erratum addresses the example in
appendix B of RFC 4122.
Key objects should be immutable
That way you can calculate toString()
, .hashCode()
and generate a unique primary key inside the Constructor
and set them once and not calculate them over and over.
Here is a straw man example of an idiomatic immutable object and calculating a unique key based on the contents of the object.
package com.stackoverflow;
import javax.annotation.Nonnull;
import java.util.Date;
import java.util.UUID;
public class Q23633894
{
public static class Person
{
private final String firstName;
private final String lastName;
private final Date birthday;
private final UUID key;
private final String strRep;
public Person(@Nonnull final String firstName, @Nonnull final String lastName, @Nonnull final Date birthday)
{
this.firstName = firstName;
this.lastName = lastName;
this.birthday = birthday;
this.strRep = String.format("%s%s%d", firstName, lastName, birthday.getTime());
this.key = UUID.nameUUIDFromBytes(this.strRep.getBytes());
}
@Nonnull
public UUID getKey()
{
return this.key;
}
// Other getter/setters omitted for brevity
@Override
@Nonnull
public String toString()
{
return this.strRep;
}
@Override
public boolean equals(final Object o)
{
if (this == o) { return true; }
if (o == null || getClass() != o.getClass()) { return false; }
final Person person = (Person) o;
return key.equals(person.key);
}
@Override
public int hashCode()
{
return key.hashCode();
}
}
}