0

I am building a simple versioned key-value store, where I need the ability to access records by specifying a (key,version) pair. More recent versions store a pointer to their previous version (ak, they store the index into the hashmap).

To keep the size of the records small, I hash the key,version pair to a Long, which I then store, and use as an index.

The current implementation I was using was to append the key and the version (keys are restricted to alphabetical letters), and using the native hashcode() function. Before each put, I test for collisions (ak, does an entry already exist, and if yes, does it have the same key,id pair), and I observe those frequently. Is this realistic? My space is of the order of a million entries. My initial assumption is that this lead to collisions being very rare. But I was wrong.

Do you have an alternative solution (one which can keep the hash to 64 bit, 2^64 is a lot greater than 1m). I would like as much as possible to avoid the size overhead of SHA/MD5

Keys are randomly generated strings of length 16 chars Versions are longs and will span the range 0 to 100000

user1018513
  • 1,682
  • 1
  • 20
  • 42
  • Maybe you want to refer to the `hashCode` implementation presented by Joshua Bloch in "Effective Java", as depicted here: http://stackoverflow.com/questions/113511/hash-code-implementation – Smutje Mar 05 '14 at 08:52
  • The collisions may be related to the domain where the keys and versions are taken from. Can you add more details about the possible values? (also, is the pointer to prev version really relevant in this question?) – Eyal Schneider Mar 05 '14 at 08:55
  • Aren't you duplicating the work of a `HashMap` here (save for the fact that for some reason you want 64bit hashes)? – fge Mar 05 '14 at 09:19
  • Eyal - edited to reflect this I need internal pointers to versions, and cannot store each key in a different hashmap.I need to store the pointer value in each record that I store, hence my worry about size. – user1018513 Mar 05 '14 at 10:19

0 Answers0