22

I'm looking for a solution to generate a checksum for any type of Java object, which remains the same for every execution of an application that produces the same object.

I tried it with Object.hashCode(), but the api says

....This integer need not remain consistent from one execution of an application to another execution of the same application.

Michal Kordas
  • 10,475
  • 7
  • 58
  • 103
Alex
  • 4,033
  • 9
  • 37
  • 52

11 Answers11

17

I had similar problem (generating good hashcode for XML files) and I found out that the best solution is to use MD5 through MessageDigest or in case you need something faster: Fast MD5. Please notice that even if Object.hashCode would be the same every time it is anyway too short (only 32 bits) to ensure high uniqueness. I think 64 bits is a minimum to compute good hash code. Please be aware that MD5 generates 128 bits long hash code, which should is even more that needed in this situation.

Of course to use MessageDigest you need serialize (in your case marshall) the object first.

kopper
  • 2,676
  • 1
  • 16
  • 17
17
public static String getChecksum(Serializable object) throws IOException, NoSuchAlgorithmException {
    ByteArrayOutputStream baos = null;
    ObjectOutputStream oos = null;
    try {
        baos = new ByteArrayOutputStream();
        oos = new ObjectOutputStream(baos);
        oos.writeObject(object);
        MessageDigest md = MessageDigest.getInstance("MD5");
        byte[] thedigest = md.digest(baos.toByteArray());
        return DatatypeConverter.printHexBinary(thedigest);
    } finally {
        oos.close();
        baos.close();
    }
}
8

Example

private BigInteger checksum(Object obj) throws IOException, NoSuchAlgorithmException {

    if (obj == null) {
      return BigInteger.ZERO;   
    }

    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ObjectOutputStream oos = new ObjectOutputStream(baos);
    oos.writeObject(obj);
    oos.close();

    MessageDigest m = MessageDigest.getInstance("SHA1");
    m.update(baos.toByteArray());

    return new BigInteger(1, m.digest());
}
PeterB
  • 323
  • 3
  • 7
5

I think you should look at serialization. Serialization mechanism needs to solve similar problem, so you can look how it's implemented.

But if you describe the problem you're trying to solve you'll probably get more precise solution.

Roman
  • 64,384
  • 92
  • 238
  • 332
4

If you control the source, you can implement hashCode() so it will be consistent from one execution to another.

Seffi
  • 194
  • 1
  • 5
3

Do you want to be able to do this for all Java objects?

In that case hashCode() doesn't work.

For some classes hashCode() has a stricter definition which guarantees equality across executions. For example String has a well-defined hashCode implementation. Similarly List and Set have well-defined values, provided all objects that they contain also have well-defined values (note that the general Collection.hashCode() does not require the value to be well-defined).

For other classes you will have to use reflection recursively with some well-defined formula to build a checksum.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
2

Hashcode is OK. Either given class overrides equals and also, as contract demands, hashcode. By contract, if equals returns true hashcode must be the same.
Or class doesn't override equals. In this case different executions of your application cannot produce same object, so there is no problem.
The only problem is that some classes (even from Java API) break contract for equals.

Tadeusz Kopec for Ukraine
  • 12,283
  • 6
  • 56
  • 83
2

The Apache commons lang library provides a HashCodeBuilder class which helps building a hash code that fills your requirements from the class properties.

Example:

   public int checksum() {
     // you pick a hard-coded, randomly chosen, non-zero, odd number
     // ideally different for each class
     return new HashCodeBuilder(17, 37).
       append(property1).
       append(property2).
       append(property3).
       toHashCode();
   }

See Commons Lang API

Jens Møller
  • 525
  • 5
  • 20
FRotthowe
  • 3,662
  • 25
  • 31
2

If you're f you're using Eclipse IDE then it has actions (under Source menu) to generate hashcode and equals functions. It allows you to choose the attributes of the class you want in the hashcode. This is similar to using the HashCodeBuilder approach that has already been suggested.

Alternatively you could stream the object to a byte array and generate an MD5 of that.

pillingworth
  • 3,238
  • 2
  • 24
  • 49
0
  1. Object -> String (For example, GSON - you will not have to write serialization not to list all fields of your class)

  2. String.hashCode() -> int (Instead of Object.hashCode()! This realization of hashCode() depends on content of String, not on address in memory --- you can use it across different app launches, different threads, etc.)

(or 2. String -> md5)

Evgeny Nozdrev
  • 1,530
  • 12
  • 15
0
/*
 * Calculate checksum of a File using MD5 algorithm
 */
public static String checkSumApacheCommons(Object obj){
    String checksum = DigestUtils.md5Hex(String.valueOf(obj));
    return checksum;
}
Narasimha
  • 17
  • 2