4

Given a certain input parameter, for a rest api i want to use the hashcode as etag. What is the probability that the json response has changed and the hashcode will be the same?

Alternatively is there a better way to do this?

@GET
    public Response getConfigurationForView(@PathParam("in1") String in1, @Context Request request) throws Exception {
        String jsonResponse = getJsonResponse(in1);
        EntityTag etag = new EntityTag(Integer.toString(in1.hashCode()) + "-" + Integer.toString(jsonResponse.hashCode()));
        ResponseBuilder builder = request.evaluatePreconditions(etag);


         if(builder == null){
             builder = Response.ok(jsonResponse, MediaType.APPLICATION_JSON);
             builder.tag(etag);
         }

        return builder.build();
    }
David Michael Gang
  • 7,107
  • 8
  • 53
  • 98
  • See this question: http://stackoverflow.com/questions/785091/consistency-of-hashcode-on-a-java-string – Eng.Fouad Mar 10 '15 at 09:03
  • As i understand the quoted post is about if the hashcode of a string is the same across different versions of java. I am asking what the probability is that for two different strings i will get the same hash code. – David Michael Gang Mar 10 '15 at 09:06
  • 2
    Beware: hashcode can be negative. Sometimes the string shall not contain hyphens. – Markus Kull Mar 10 '15 at 09:08

2 Answers2

3

Given you only have 4 billion possible hash codes for the whole variety of strings, there's a decent chance you'll eventually encounter ETag collisions.

Looking at how String.hashCode() is implemented:

        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;

-- you can even come up with possible collisions yourself. For instance, "" (an empty string) and "\0" (the string containing only \0 character) will give you the same hashCode of 0.

I would suggest you use a SHA1 hash (or MD5, but see these notes on security and CPU running time first).

Provided you proceed with SHA1 hash, your code might look like this:

public static String calculateEtag(final String s) throws java.security.NoSuchAlgorithmException {
    final java.nio.ByteBuffer buf = java.nio.charset.StandardCharsets.UTF_8.encode(s);
    final java.security.MessageDigest digest = java.security.MessageDigest.getInstance("SHA1");
    buf.mark();
    digest.update(buf);
    buf.reset();
    return String.format("W/\"%s\"", javax.xml.bind.DatatypeConverter.printHexBinary(digest.digest()));
}

This will produce the output identical of that of sha1sum utility. You could use BigInteger to convert a byte buffer to a hex string as well:

new BigInteger(1, digest.digest()).toString(16)

-- but javax.xml.bind.DatatypeConverter.printHexBinary() is several times faster.

Community
  • 1
  • 1
Bass
  • 4,977
  • 2
  • 36
  • 82
  • 1
    We may use DigestUtils: https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/digest/DigestUtils.html in order to create md5 or sha1 – David Michael Gang Mar 10 '15 at 09:47
0

If you can use jdk8+ and Google Guava, you might try

final String myETag = ( (Set<Object>) this.setOfChangeableProperties ).stream()
  .filter( Objects::nonNull )
  .map( Objects::toString )
  .reduce( (a,b) -> a.concat(b) )
  .map( s -> Hashing.md5().hashUnencodedChars( s ).toString() )
  .orElse( "nothing of interest to hash!" )
Lonnie
  • 88
  • 1
  • 6