1

When I implement a collection that uses hashes for optimizing access, should I cache the hash values or assume an efficient implementation of hashCode()?

On the other hand, when I implement a class that overrides hashCode(), should I assume that the collection (i.e. HashSet) caches the hash?

This question is only about performance vs. memory overhead. I know that the hash value of an object should not change.

Clarification: A mutable object would of course have to clear the cached value when it is changed, whereas the collection relies on objects not changing. But this is not relevant for my question.

Cephalopod
  • 14,632
  • 7
  • 51
  • 70
  • 2
    *I know that the hash value of an object should not change.* -- It *should* change if the attributes of the object changes. – aioobe Aug 29 '11 at 11:24
  • For which operation of a HashSet or HashMap would a cache of the hashes be useful? – JB Nizet Aug 29 '11 at 11:24
  • 1
    @JB Nizet I assume he means that `hashCode` would return the cached value. However, if the value might change, you'd need to check for changes every time and this might be even more inefficient than just recalculating the hash. :) – Thomas Aug 29 '11 at 11:27
  • @Thomas: no, he thinks about caching the hashes in the collection implementation. Re-read the question: "When I implement a collection", and "should I assume that the collection (i.e. HashSet) caches the hash" – JB Nizet Aug 29 '11 at 11:29
  • @aioobe: ok, I should be more precise: The hashcode and object should not change once they are items in collections that use the hash for building internal structures. – Cephalopod Aug 29 '11 at 11:29
  • @Arian the hashing collections will calculate the hash once and then put the object into the corresponding buckets, thus you could think of that being "cached". Changing the object afterwards would not cause the collection to rehash. @JB Nizet you're right, I just read the `efficient implementation of hashCode()?` etc. parts :) – Thomas Aug 29 '11 at 11:35
  • 1
    @Arian ok, but an Object should not know whether it is stored in a collection or map, that's not it's responsibility. – Sean Patrick Floyd Aug 29 '11 at 11:37
  • @Sean: that's true. However at some point someone should know that the object must not be modified or else the collection might break. – Cephalopod Aug 29 '11 at 11:54

5 Answers5

8

When designing Guava's ImmutableSet and ImmutableMap classes, we opted not to cache hash codes. This way, you'll get better performance from hash code caching when and only when you care enough to do the caching yourself. If we cached them ourselves, we'd be costing you extra time and memory even in the case that you care deeply about speed and space!

It's true that HashMap does this caching, but it was HashMap's author (Josh Bloch) who strongly suggested we not follow that precedent!

Edit: oh, also, if your hashCode() is slow, the caching by the collection only addresses half of the problem anyway, as hashCode() still must be invoked on the object passed in to get() no matter what.

Kevin Bourrillion
  • 40,336
  • 12
  • 74
  • 87
3

Considering that java.lang.String caches its hash, i guess that hashcode() is supposed to be fast.

So as first approach, I would not cache hashes in my collection. In my objects that I use, I would not cache hash code unless it is oviously slow, and only do it if profiling tell me so.

If my objects will be used by others, i would probubly consider cachnig hash codes sooner (but needs measurements anyway).

Alpedar
  • 1,314
  • 1
  • 8
  • 12
  • Note that string is immutable and thus the value is guaranteed to not change. In that case caching might be feasible but as you said for custom objects it's most often not worth the hassle (+1). – Thomas Aug 29 '11 at 11:31
1

On the other hand, when I implement a class that overrides hashcode(), should I assume that the collection (i.e. HashSet) caches the hash?

No, you should not make any assumptions beyond the scope of the class you are writing.

Of course you should try to make your hashCode cheap. If it isn't, and your class is immutable, create the hashCode on initialization or lazily upon the first request (see java.lang.String). If your class is not immutable, I don't see any other option than to re-calculate the hashCode every time.

Sean Patrick Floyd
  • 292,901
  • 67
  • 465
  • 588
  • hash code should not change, so even mutable classes does not need to compute hash everytime. – Alpedar Aug 29 '11 at 11:30
  • 1
    @user470365 that's nonsense. hashCode() must be consistent with equals(), so if data changes, hashCode must as well. What's true is that mutable types make bad map keys, because of their changing hash code. – Sean Patrick Floyd Aug 29 '11 at 11:36
  • and of course equality should not change too. Classes should not base eq & hash on mutable parts, otherwise its worst than not overriding them. What i find acceptable (but suboptimal) is have mutable state when eq and hashcode fails (throws exception) that turn into stable state where eq & hc works. – Alpedar Aug 29 '11 at 11:45
  • @user470365 "Classes should not base eq & hash on mutable parts" that's true for some business Objects, but false for Collection-like structures and simple value-holder types – Sean Patrick Floyd Aug 29 '11 at 12:01
  • imo they should not implement hashCode & equals. (or should never change when in collection -> should not get out of code from someone who "knows what he is doing") – Alpedar Aug 29 '11 at 12:20
0

I'd say in most cases you can rely on efficient implementations of hashCode(). AFAIK, that method is only invoked on lookup methods (like contains, get etc.) or methods that change the collection (add/put, remove etc.).

Thus, in most cases there shouldn't be any need to cache hashes yourself.

Thomas
  • 87,414
  • 12
  • 119
  • 157
0

Why do you want to cache it? You need to ask objects what their hashcode is while you're working with it to allocate it to a hash bucket (and any objects that are in the same bucket that may have the same hashcode), but then you can forget it.

You could store objects in a wrapper HashNode or something, but I would try implementing it first without caching (just like HashSet et al does) and see if you need the added performance and complexity before going there.

Bohemian
  • 412,405
  • 93
  • 575
  • 722