0

Sometimes it's convenient to create a hash table where the hash function uses the hashed object's address. The hash function is easy to implement and cheap to execute.

But there's a problem if you are working on a system which uses some form of copying garbage collection because after an object has been moved the previously calculated hash value is incorrect. Even rehashing the table during or after garbage collection seems problematic since (in principle) a hash function could consume memory and trigger another garbage collection.

My question is how to work around these issues. I'm not seeing any clean solution without some kind of compromise.

john
  • 85,011
  • 4
  • 57
  • 81
  • 1
    Are you talking about a specific programming language and/or environment? – Holger Dec 18 '20 at 13:47
  • @Holger I'm interested in general terms since I'm implementing a language which will have both hash tables and copying garbage collection. – john Dec 18 '20 at 14:21
  • Since asking the question I've discovered [this paper](http://www.schemeworkshop.org/2007/procPaper3.pdf) which seems promising. – john Dec 18 '20 at 14:23
  • 1
    Environments like Java JVMs don’t use the address as hash code at all; they store the identity hash code within the object’s header. But I don’t agree on that paper’s claim that this approach “makes the construction and collection of every object in the system more expensive.” First, the collection of objects is not affected by this at all. Second, in case of the Hotspot JVM, only a few bits within the already existing object header are used and only when an identity hash code was ever requested, so the construction of objects also is not affected by it. – Holger Dec 18 '20 at 14:32
  • @Holger Could you provide a link which describes how the Hotspot JVM implements this. I'm not really understanding how a hash code can be stored in only a few bits. – john Dec 18 '20 at 14:36
  • 1
    Such a hash code has not the same variance as an entire `int`, but mind that neither would addresses; the lower bits are always zero due to alignment and the upper bits are all the same for objects allocated in the same memory region. Increasing a number for every object for which an identity hash code has been actually requested can lead to fair results. There’s [How does the JVM ensure that System.identityHashCode() will never change?](https://stackoverflow.com/q/1063068/2711488), further https://stackoverflow.com/a/26049632/2711488 and https://stackoverflow.com/a/13778286/2711488 – Holger Dec 18 '20 at 14:51

1 Answers1

0

How to work around this issue depends on the tools which are provided by the specific ecosystem of the specific programming language one is using.

For example, in C#, one can use RuntimeHelpers.GetHashcode() for getting a hash which is guaranteed not to change over the full lifetime of an object, even when a garbage collection takes place and the physical adress of the object changes in memory.

Doc Brown
  • 19,739
  • 7
  • 52
  • 88
  • 1
    I should have made this clear. I'm not working within a particular system, I'm implementing such a system. Of course I'm interested to know how C# implements `GetHashCode`. From a quick look at the above link it seems this is by storing the generated hash code along the object itself. That's not suitable for what I'm working on. – john Dec 18 '20 at 14:27
  • @john: well, ask a better question, get a better answer ;-) – Doc Brown Dec 18 '20 at 16:19