0

I have this class...

public class StartStopTouple {

    public int iStart;
    public int iStop;
    public int iHashCode;

    public StartStopTouple(String start, String stop) {
        this.iStart = Integer.parseInt(start);
        this.iStop = Integer.parseInt(stop);
    }

    @Override
    public boolean equals(Object theObject) {

        // check if 'theObject' is null
        if (theObject == null) {
            return false;
        }
        // check if 'theObject' is a reference to 'this' StartStopTouple... essentially they are the same Object
        if (this == theObject) {
            return true;
        }

        // check if 'theObject' is of the correct type as 'this' StartStopTouple
        if (!(theObject instanceof StartStopTouple)) {
            return false;
        }

        // cast 'theObject' to the correct type: StartStopTouple
        StartStopTouple theSST = (StartStopTouple) theObject;

        // check if the (start,stop) pairs match, then the 'theObject' is equal to 'this' Object
        if (this.iStart == theSST.iStart && this.iStop == theSST.iStop) {
            return true;
        } else {
            return false;
        }
    } // equal() end

    @Override
    public int hashCode() {
        return iHashCode;
    }
}

... and I define equality between such Objects only if iStart and iStop in one Object are equal to iStart and iStop in the other Object.

So since I've overridden equals(), I need to override hashCode() but I'm not sure how to define a good hash function for this class. What would be a good way to create a hash code for this class using iStart and iStop?

Hristo
  • 45,559
  • 65
  • 163
  • 230
  • a simple 'barrel shift hash' is sufficient. should aid in SO searching. HashMap and similar will actually re-hash hash value, so even-distribution isn't terribly important. –  Jun 06 '11 at 00:28
  • @pst... would you mind providing some example code? – Hristo Jun 06 '11 at 00:29
  • It depends upon what distribution of values `iStart` and `iStop` will take -- if they range between `0-9`, then `iStart*10 + iStop` would probably work great. So, what ranges of input are you expecting? – sarnold Jun 06 '11 at 00:29
  • @Hristo It's called ... "search the phrase" second link as http://en.wikipedia.org/wiki/Rolling_hash –  Jun 06 '11 at 00:30
  • @sarnold... I'm expecting pretty much the whole range of an unsigned integer. – Hristo Jun 06 '11 at 00:31

3 Answers3

2

From Bloch's "Effective Java":

int iHashCode = 17;
iHashCode = 31 * iHashCode + iStart;
iHashCode = 31 * iHashCode + iStop;

Note: 31 is chosen because the multiplication by 31 can be optimized by the VM as bit operations. (But performance is not useful in your case since as mentioned by @Ted Hopp you are only computing the value once.)

Note: it does not matter if iHashCode rolls over past the largest int.

toto2
  • 5,306
  • 21
  • 24
  • @toto... I'm actually reading his book, but I guess I didn't get to that part yet :) – Hristo Jun 06 '11 at 00:32
  • The code is slightly wrong (and very broken) as presented. `iHashCode` should be `hash` (or vice-versa). This uses implicit integer overflow to achieve the "rolling" effect. –  Jun 06 '11 at 00:32
2

I'd be tempted to use this, particularly since you're going to memoize it:

Long.valueOf((((long) iStart) << 32) | istop)).hashcode();
Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • @Ted... what do you mean I'm going to memoize it? – Hristo Jun 06 '11 at 00:54
  • It looked from your code like you were going to compute it once and keep the value in the `iHashCode` field. If you're going to do that, however, I suggest you make `iStart` and `iStop` private and provide set/get methods. Then you can update `iHashCode` if either `iStart` or `iStop` changes. – Ted Hopp Jun 06 '11 at 00:58
  • right... that's what I do so that when `hashCode()` is called, I can just return it. I suppose I can compute it every time `hashCode()` is called but I'm not sure if that is better or worse. – Hristo Jun 06 '11 at 01:00
  • 1
    By the way I checked in the Java code how the hash is done for a `Long`: `return (int)(value ^ (value >>> 32));`, so that's the same as @ratchet freak's answer! ... which @Hristo complained about. – toto2 Jun 06 '11 at 01:01
  • @toto... good point. I think both answers are good, but which is better? :) – Hristo Jun 06 '11 at 01:03
  • 1
    @Hristo the one from @ratchet has less indirection so it's faster. – toto2 Jun 06 '11 at 01:05
  • 1
    @toto... what do you mean by indirection? – Hristo Jun 06 '11 at 01:07
  • @Hristo It's the number of methods you call on the stack. With @Ted Hopp's method you call `Long.valueOf`, then `hashCode`, instead of directly doing the operation right away in your own `hashCode` method. Calling a method can be expensive (I'm not sure how expensive and often the JVM will smarten up and call directly the method needed instead of going through all the chained method calls). But better to minimize the number of method calls to be on the safe side. – toto2 Jun 06 '11 at 01:26
  • @hristo he means less function calls: this solution calls a function that will allocate an object, returns it and then hashcode is called on that object. My solution does the bit-twiddling directly – ratchet freak Jun 06 '11 at 01:29
2

the simplest might be best

iHashCode = iStart^iStop;

the XOR of the two values

note this will give equal hashcodes when start and stop are swapped

as another possibility you can do

iHashCode = ((iStart<<16)|(iStart>>>16))^iStop;

this first barrel shifts start by 16 and then xors stop with it so the least significant bits are put apart in the xor (if start is never larger than 65k (of more accurately 2^16) you can omit the (iStart>>>16) part)

ratchet freak
  • 47,288
  • 5
  • 68
  • 106
  • I don't think this will work... if you xor 6 with 9 and 0 with 15, the result is the same. – Hristo Jun 06 '11 at 00:44
  • chances are start must always be < stop so i'd go with this algorithm. Simple and effective. I doubt you will be able to tell the difference between this and something more complicated. @Hristo of course it will work... question is just is it efficient. Given that the values range the entire integer range, this isn't much of a concern. Plus, unless you have a collection with millions of items such that the hashcode doesn't wrap quickly, it probably doesn't matter. – MeBigFatGuy Jun 06 '11 at 00:46
  • @MeBigFatGuy... I just gave a counter-example. 6^9 = 0^15. – Hristo Jun 06 '11 at 00:53
  • a@Hristo add a shift if you are never going over say 65k check the edit – ratchet freak Jun 06 '11 at 00:59
  • @ratchet freak see my comment on @Tep Hopp's answer. – toto2 Jun 06 '11 at 01:03