0

I have a fixed list of strings of the same length. I want to store them in a hash table, that would consume the least memory without deteriorating the expected look up time.

Theoretically, it is possible to find one to one mapping between the strings and integers, so that the look up time would be constant.

The question is how to find the "best" hash function that would get me close to this goal? A possible way to do so is to just try many hash functions and choose the one with the least number of collisions. Is there an "analytic" approach?

Roy
  • 837
  • 1
  • 9
  • 22
  • You'd need to analyze the properties of the strings (perhaps check for some common prefixes, which you can then take into account in your hash function). I don't think there's a general way of finding such a hash function (in much more detail than "look at the data", that is). – Bernhard Barker Aug 11 '15 at 14:59
  • Picking good hash functions is usually more about optimizing speed rather than memory - the difference between the absolute best and absolute worst hash functions should result in a small, constant number of bytes per object difference, regardless of the size of the object (at least with most hash table implementations I've seen). – Bernhard Barker Aug 11 '15 at 15:20
  • There are of course [generic hash functions for strings](http://stackoverflow.com/questions/2624192/good-hash-function-for-strings) - while generally good and acceptable, they are unlikely to be the "best" for your data. Do you have any reason to believe that such a hash function isn't "good enough" for your purposes? – Bernhard Barker Aug 11 '15 at 15:23
  • You're looking for a [Perfect hash function](https://en.wikipedia.org/wiki/Perfect_hash_function). A Google search will reveal a number of papers about how to generate them. Quite doable with a fixed list of strings. – Jim Mischel Aug 11 '15 at 17:33

0 Answers0