2

So ... I'm looking for a hash function that -- assuming no input skew -- will distribute nonempty Strings of (up to) 16 bytes "reasonably uniformly" onto a range [0..n] where n is user input but does not change over time.

And I should be able to argue why the function should provide that "resonably uniform" distribution.

In the end, all I need is a Java implementation of the hash function for use in a server and a reason "why" this hash function is suitable. So I'm looking less for "perfect uniform distribution" (if such a thing even exists) and more for "reasonably fast".

Still, I've never been much of a mathematician, so ...

  • is there some native Java functionality that would provide such a hash?
  • or any suggestion on how to come up with such a hash function?
User1291
  • 7,664
  • 8
  • 51
  • 108
  • 7
    Sounds like `String.hashCode() % n` meets your requirements. That's certainly considered "good enough" for use in `HashMap` etc. – Andy Turner Sep 30 '16 at 12:05
  • @AndyTurner and how would I argue for the uniformity of the distribution? – User1291 Sep 30 '16 at 12:07
  • Honestly, I don't know. But like I say, it gives good enough performance for use in the `java.util.*` hash-based collections, which will only perform well if the distribution of hashes is reasonably uniform. They perform well enough that most of the time you don't need to use anything else. – Andy Turner Sep 30 '16 at 12:09
  • For a discussion on the string hash function (both hash calculation performance and risk of clashes) see the answers to http://stackoverflow.com/questions/299304/why-does-javas-hashcode-in-string-use-31-as-a-multiplier – Klas Lindbäck Sep 30 '16 at 12:29
  • Who do you need to convince? Knuth, or someone who doesn't need a watertight proof? – AakashM Sep 30 '16 at 12:44
  • 1
    @AakashM "not watertight" should be acceptable so long as I can provide any kind of coherent "proof". (Or at least that's what I rationalised.) – User1291 Sep 30 '16 at 12:47

0 Answers0