2

Provided you have a HashMap, and you know the maximum number of elements it will hold (at most; for example because you use an enum as key, HashMap<EnumType, Integer> countPerEnumType;, and you thus know the maximum possible number of elements will be EnumType.values().length), would you specify initial capacity and load factor for that HashMap? Why (not)?

If one would, I assume the load factor could be 1? (Safely? Or with adverse side effects?)

Would the initial capacity be length, length/loadFactor, or (length/loadFactor) + 1 (i.e. will the HashMap be expanded when it is full, or when it is full and the next element is about to be inserted?

How would you have to set up the hash map to prevent resizing and wasting memory for empty buckets?

EDIT: The example uses Enums, and Lew Bloch suggested EnumMap (beforte I rephrased the question) - but suppose the keys weren't enums, but you still knew the number of elements you'll end up with in advance?

Community
  • 1
  • 1
Christian
  • 6,070
  • 11
  • 53
  • 103
  • Are the enum values the keys? – chrylis -cautiouslyoptimistic- Jan 06 '17 at 22:04
  • Yes. Sorry. I thought `HashMap` made that implicit... ;-) – Christian Jan 06 '17 at 22:18
  • 1
  • @lew-bloch [beat you to it](http://stackoverflow.com/a/41514933/2018047) :) But suppose they weren't enums, but you'd still know the number of elements in advance? – Christian Jan 06 '17 at 22:21
  • Then unless it's huge, I wouldn't worry about it and would let it run dynamically. – chrylis -cautiouslyoptimistic- Jan 06 '17 at 22:21
  • 1
    If the keys aren't enums but you know all the keys in advance, write a custom hash function that returns a distinct value in the range `[0..n-1]` for each key (where `n` is the number of keys). Then use a standard `HashMap`. See [Perfect Hash Function](https://en.wikipedia.org/wiki/Perfect_hash_function) – Jim Garrison Jan 06 '17 at 22:26
  • FWIW on one occasion I inititialized a `HashMap` with capacity `(knownMaxSize * 4 + 2) / 3` (which I believe should prevent resizing) and got a review comment saying it was too complicated, so I deleted it again. This seems to agree with @chrylis’s comment. – Ole V.V. Jan 06 '17 at 22:41
  • For a code base as ours with very many `Map`s I still dream of writing a library method that constructs a `HashMap` with a capacity calculated as in my previous comment, with a proper comment to explain the formula. Haven’t come around to it, at least not yet. – Ole V.V. Jan 06 '17 at 22:43
  • 1
    I just found this most interesting question: [What is the optimal capacity and load factor for a fixed-size HashMap?](http://stackoverflow.com/q/7115445/2018047) The users over there haven't necessarily managed to come up with an explanation, but a lot of imperical data to suggest that what everyone is saying about initial capacity and load factor holds true: "Difficult. Don't touch." ;-) One thing to note is that the capacity gets a make-over before it's used, anyway: http://stackoverflow.com/a/17330045/2018047 – Christian Jan 06 '17 at 22:55

2 Answers2

5

Use an EnumMap instead, which is a high-performance Map specifically designed for enum keys:

Map<EnumType, Integer> counts = new EnumMap<>();
chrylis -cautiouslyoptimistic-
  • 75,269
  • 21
  • 115
  • 152
Lew Bloch
  • 3,364
  • 1
  • 16
  • 10
  • Nice find, even if it doesn't exactly fit the question - it does help with the specific problem at hand... :) – Christian Jan 06 '17 at 22:17
  • How does it not fit the question? – Jim Garrison Jan 06 '17 at 22:24
  • Should you define initial capacity and load factor of a HashMap, if the number of elements is known in advance? No mention of enums. :) – Christian Jan 06 '17 at 22:25
  • @Christian The answer provides the correct solution to *the problem presented*, even if the question as phrased asked something else. Frequently the best answer to "how do I do this" is "don't do that". – chrylis -cautiouslyoptimistic- Jan 06 '17 at 23:08
  • "No mention of enums" other than "Provided you have a HashMap – Lew Bloch Jan 06 '17 at 23:23
  • 1
    A `HashMap` that holds a small number of elements, the number being (roughly) known ahead of time, can just be initialized with a capacity four-thirds the number of elements and the default load factor. If the number of elements is small enough (like fewer than a few hundred), and the map is filled once or very rarely, then just use the no-arg constructor; this has negligible if even measurable impact on memory or performance. Forget about it. "Premature optimization is the root of all evil." - Dr. Knuth – Lew Bloch Jan 06 '17 at 23:27
  • Do you have a source for that quote, by any chance? :) I'm really trying to understand those two parameters, and so far haven't seen anyone use them. But I keep telling myself there must be a good reason why they are there, so there must be legitimate reasons when to use them. Which means there must also be a plausible explanation as to when to use them - one that hopefully even someone who didn't study computer science or math can understand... – Christian Jan 06 '17 at 23:46
  • One source for the quote: https://en.wikiquote.org/wiki/Donald_Knuth. – Ole V.V. Jan 08 '17 at 13:20
  • Not all things in Java are necessarily there for "a good reason". Just look at all the deprecated methods in `Object` and `Date` for evidence that the Java API has had mistakes in it. – Lew Bloch Jan 10 '17 at 03:06
0

My understanding is that a HashMap will resize during a put operation if that put will cause the map's size to exceed initial capacity multiplied by load factor.

If you declare the HashMap to have an initial capacity equal to the number of elements that you expect to insert, and a load factor of 1, then all buckets will be filled, and no buckets will be wasted.

  • 3
    "all buckets will be filled", no, there could still be hash collisions. Since an enum uses `Object`'s `hashCode` by default. – Jorn Vernee Jan 06 '17 at 21:55
  • The odds of a collision with any type, enum or otherwise, are very low if the number of entries is reasonably slow, and it is not really harmful if a collision does occur. – Lew Bloch Jan 06 '17 at 23:24