I am building a simple versioned key-value store, where I need the ability to access records by specifying a (key,version) pair. More recent versions store a pointer to their previous version (ak, they store the index into the hashmap).
To keep the size of the records small, I hash the key,version pair to a Long, which I then store, and use as an index.
The current implementation I was using was to append the key and the version (keys are restricted to alphabetical letters), and using the native hashcode() function. Before each put, I test for collisions (ak, does an entry already exist, and if yes, does it have the same key,id pair), and I observe those frequently. Is this realistic? My space is of the order of a million entries. My initial assumption is that this lead to collisions being very rare. But I was wrong.
Do you have an alternative solution (one which can keep the hash to 64 bit, 2^64 is a lot greater than 1m). I would like as much as possible to avoid the size overhead of SHA/MD5
Keys are randomly generated strings of length 16 chars Versions are longs and will span the range 0 to 100000