I am currently in need to serialize arbitrary Java objects since I would like to use the Hash as a key for a hash table. After I read various warnings that the default hashCode
creates collisions way to often, I wanted to switch to hashing via MessageDigest
to use alternative algorithms (e.g. SHA1, ...) that are said to allow more entries without collisions. [As a sidenote: I am aware that even here collisions can occur early on, yet I want to increase the likelihood to remain collision free.]
To achieve this I tried a method proposed in this StackOverflow post. It uses the following code to obtain a byte[]
necessary for MessageDigest
:
public static byte[] convertToHashableByteArray(Object obj) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutput out = null;
byte[] byteOutput = null;
try {
out = new ObjectOutputStream(bos);
out.writeObject(obj);
byteOutput = bos.toByteArray();
} catch (IOException io) {
io.printStackTrace();
} finally {
try {
if(out != null) { out.close(); }
} catch(IOException io) {
io.printStackTrace();
}
try {
bos.close();
} catch(IOException io) {
io.printStackTrace();
}
}
return byteOutput;
}
This, however, causes the problem that only objects implementing the serializable
interface will be serialized/converted into a byte[]
. To circumvent this issue I applied toString()
to the given obj
in the catch
clause to enforce getting a byte[]
in all cases:
public static byte[] convertToHashableByteArray(Object obj) {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
ObjectOutput out = null;
byte[] byteOutput = null;
try {
out = new ObjectOutputStream(bos);
out.writeObject(obj);
byteOutput = bos.toByteArray();
} catch (IOException io) {
String stringed = obj.toString();
byteOutput = stringed.getBytes();
} finally {
try {
if(out != null) { out.close(); }
} catch(IOException io) {
io.printStackTrace();
}
try {
bos.close();
} catch(IOException io) {
io.printStackTrace();
}
}
return byteOutput;
}
However, this still feels utterly wrong for me. So my question is, whether there is a better alternative to convert arbitrary objects to byte[]
to be able to compute hashes. Preferably a solution that works without using additional libraries or one using well established ones like Apache Commons.
(Beside that I am also open for other approaches to obtain SHA1/SHA512 hashes of arbitrary Java objects.)