4

Which is better option to use:

And why?

Also what is loadfactor and modcount in HashMap's property?

When I debug my code in eclipse and look at value of HashMap it shows a property named loadfactor with a value of 0.75 and a property named modcount with a value of 3.


Where i use hashmap in my code:-

I'm developing a communication application you can a say a chat application. in which i store all send/received messages in HashMap. Now as i cant assume how many messages will a user will send/receive i declare a hashmap without initial capacity. What i have written is

Map<String, Map<String, List<String>>> usersMessagesMap = new HashMap<String, Map<String,List<String>>>();

if i use it with initial capacity of 100 or higher will it effect the code?

Harry Joy
  • 58,650
  • 30
  • 162
  • 207
  • What **exactly** do you mean by "with default size" or "without size"? – Joachim Sauer Feb 08 '11 at 11:05
  • I guess he means the initial capacity. – Daniel Rikowski Feb 08 '11 at 11:06
  • It depends greatly. Even if you know the keyset size ahead of time, there's no guarantee that the keys' `hashCode` function will be uniformly distributed (unless you write one by yourself). If you know that the default capacity is much smaller than the keyset size, I'd recommend using a higher initial capacity to reduce rehashing. – PaoloVictor Feb 08 '11 at 11:07
  • @Joachim,@DR yes i mean initial capacity. – Harry Joy Feb 08 '11 at 11:08
  • I suggest reading the Javadocs, they explain this quite well. – Michael Borgwardt Feb 08 '11 at 11:12
  • @Hunter2: what do you mean by reduce rehashing? – Harry Joy Feb 08 '11 at 11:13
  • @Harry Joy: Check my answer. As the map grows and the load factor increases, there will be a need to increase the map's underlying array. As the stored keys depend on the map size, they need to be recalculated (yes, Java's `HashMap` creates _another_ hash based on the object's `hashCode`) – PaoloVictor Feb 08 '11 at 11:19
  • @Harry http://stackoverflow.com/questions/1324064/performance-of-hashmap-with-different-initial-capacity-and-load-factor – razlebe Feb 08 '11 at 11:19
  • BTW: HashMap has to be a power of 2 in bucket size (and it rounds up otherwise), so setting it to 10 is the same as setting it to 16 (just a bit more confusing) – Peter Lawrey Feb 08 '11 at 11:32

4 Answers4

8

Have you checked the HashMap API Javadoc?

  • The capacity is the number of buckets in the hash table
  • The initial capacity is simply the capacity at the time the hash table is created
  • The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased

On setting the initial size too high:

Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

Impact of the load factor on the performance:

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

Well, in short: depending on the estimated size and the expected growth ratio, you'll have to chose an aproximation or the opposite.

Usually, if you know the initial number of elements for your Map, it's recommended to set it at building time, avoiding early rehashes on initialization time.

Tomas Narros
  • 13,390
  • 2
  • 40
  • 56
4

If you know that the keyset size will be much bigger than the initial capacity (which is 16), I'd use a higher initial capacity to reduce rehashing (as the number of keys grow and the N/C value (where N is the number of stored keys and C is the map's capacity) reaches the load factor, the map array is extended and the keys are rehashed). Also, since the map size increases exponentially, you won't see a drastic reduction on the number on rehashing unless you have a significant number of keys.

So, my opinion is: if you have the spare memory and lots of keys, go for a higher initial capacity.

PaoloVictor
  • 1,296
  • 8
  • 18
  • what does this `loadfactor` means? – Harry Joy Feb 08 '11 at 11:18
  • @Harry Joy: The load factor the maximum value allowed for `N/C`, where `N` is the number of keys currently stored at the map and `C` is the map's capacity. When the `N/C` value reaches the load factor, the map is resized. The idea of resizing the map _before_ `N/C` reaches 1 is because a high `N/C` value means a higher probability of collisions (the chance of the hash function to map two keys to the same array slot). – PaoloVictor Feb 08 '11 at 11:24
  • 4
    strictly speaking the `loadfactor` is the maximum value that the `HashMap` will allow the value `N/C` to approach. Once that threshold is reached, it will resize the internal array. – Joachim Sauer Feb 08 '11 at 11:26
  • @Joachim Sauer Thanks for clearing that up. I will edit the comment and the answer. – PaoloVictor Feb 08 '11 at 11:29
  • thnx. [+1] for first explaining rehashing and loadfactor. – Harry Joy Feb 08 '11 at 11:52
2
  • Better in terms of simplicity, without initial size.
  • Better in terms of performance, try that out yourself.

Found an SO thread, Performance of Hashmap with Different Initial Capacity And Load Factor

Load Factor

The performance of most collision resolution methods does not depend directly on the number n of stored entries, but depends strongly on the table's load factor, the ratio n/s between n and the size s of its bucket array. Sometimes this is referred to as the fill factor, as it represents the portion of the s buckets in the structure that are filled with one of the n stored entries. With a good hash function, the average lookup cost is nearly constant as the load factor increases from 0 up to 0.7(about 2/3 full) or so. -- Wikipedia on Load Factor

Now your new question

if i use it with initial capacity of 100 or higher will it effect the code?

Its not a good idea, you are good to go with default thing. Don't think too much about this in the start. As he said, "premature optimisation is the root of all evil". It wouldn't give any real benefit, whatsoever.

Community
  • 1
  • 1
Adeel Ansari
  • 39,541
  • 12
  • 93
  • 133
  • @Adeel: i had tried it out at my side but can't figure out the difference. thats why i'm asking here. – Harry Joy Feb 08 '11 at 11:12
  • @Harry: Doesn't that mean it doesn't matter as far as performance is concerned. And of course when one doesn't gain anything by giving initial size, why one even like to give one. Further, you might like to share the code you have written to find the difference. And that way, you wouldn't get this kind of dumb answers. – Adeel Ansari Feb 08 '11 at 11:13
  • Most of the time it doesn't matter, but if your `HashMap` is going to be very big and you know the final size (approximately or exactly, doesn't really matter), then providing that can improve the performance of constructing the `HashMap` and filling it with values. – Joachim Sauer Feb 08 '11 at 11:20
  • @Joachim: Yes, very much logical. But I just wanted him to try it out. And therefore, I said him to share the code he tried, so we can comment. – Adeel Ansari Feb 08 '11 at 11:24
  • @Harry, added my answer for your new question. – Adeel Ansari Feb 08 '11 at 11:41
  • @Adeel: [+1] thnxs for your answer on my new question.I'll keep it default. – Harry Joy Feb 08 '11 at 11:51
1

Strictly speaking you should not care about the internal fields of the HashMap (loadfactor and modcount are fields, not properties: properties would have getters/setters).

The modcount is most likely the number of modifications applied to the Map since its creation. It's used to detect concurrent modifications and to know when an Iterator becomes "broken" (because the originating Map was structurally modified since it was created).

The loadfactor is probably a field storing the second argument of the two-argument constructor. It defines how "tightly packed" the internal array may become until it is resized (which results in a re-hashing of all the keys).

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614