10

First of all, I want to make it clear that I would never use a HashMap to do things that require some kind of order in the data structure and that this question is motivated by my curiosity about the inner details of Java HashMap implementation.

You can read in the java documentation on Object about the Object method hashCode.

I understand from there that hashCode implementation for classes such as String and basic types wrappers (Integer, Long,...) is predictable once the value contained by the object is given. An example of that would be that calls to hashCode for any String object containing the value hello should return always: 99162322

Having an algorithm that always insert into an empty Java HashMap where Strings are used as keys the same values in the same order. Then, the order of its elements at the end should be always the same, am I wrong?

Since the hash code for a concrete value is always the same, if there are not collisions the order should be the same. On the other hand, if there are collisions, I think (I don't know the facts) that the collisions resolutions should result in the same order for exactly the same input elements.

So, isn't it right that two HashMap objects with the same elements, inserted in the same order should be traversed (by an iterator) giving the same elements sequence?

  • are you inserting the elements into the map in the same order? – Nathan Hughes Apr 02 '14 at 13:31
  • 2
    Given the *exact same* level of code, and the *exact same* objects inserted in the *exact same* order, then the apparent order in the HashMap will *probably* be the same. In other words, it would be foolish to count on it. – Hot Licks Apr 02 '14 at 13:33
  • You can use a `LinkedHashSet`. No need of external library. – Simon Arsenault Apr 02 '14 at 13:37
  • @SimonArsenault I don't need the order to be preserved. I was just curious about the current implementation of HashMap. I want to be sure of the elements order I could use TreeMap too. – Pablo Francisco Pérez Hidalgo Apr 02 '14 at 14:04
  • @HotLicks If your comment was an answer I would accepted it. – Pablo Francisco Pérez Hidalgo Apr 02 '14 at 14:15
  • 2
    That's odd, very few people find my comments acceptable. (Especially my wife.) – Hot Licks Apr 02 '14 at 14:27
  • (Keep in mind that if any of the keys is an object which directly or indirectly depends on Object.hashCode, that method returns a value which is based on the address of the object in the heap (at the time the hash is first requested) and is, for all intents and purposes, non-deterministic. Of course, any hash table utilizing such objects wouldn't work very well anyway.) – Hot Licks Apr 02 '14 at 14:49
  • @HotLicks Do you mean key objects whose class do not override Object.hashCode? – Pablo Francisco Pérez Hidalgo Apr 02 '14 at 14:53
  • Or an object which is composed of such objects, and constructs its own hash by hashing together the hashes of those objects. It only takes one. – Hot Licks Apr 02 '14 at 16:22
  • 1
    @HotLicks *"any hash table utilizing such objects wouldn't work very well"* - I disagree. Actually and unfortunately, `enum`s work like [this](http://stackoverflow.com/questions/4885095/what-is-the-reason-behind-enum-hashcode) and they make the `hashCode` of any object containing them essentially random (assuming standard `hashCode` implementation). – maaartinus May 31 '14 at 13:44

3 Answers3

7

As far as I know the order (assuming we call "order" the order of elements as returned by values() iterator) of the elements in HashMap are kept until map rehash is performed. We can influence on probability of that event by providing capacity and/or loadFactor to the constructor.

Nevertheless, we should never rely on this statement because the internal implementation of HashMap is not a part of its public contract and is a subject to change in future.

Alexey Malev
  • 6,408
  • 4
  • 34
  • 52
  • Could you please explain what this "rebalance" consist on? – Pablo Francisco Pérez Hidalgo Apr 02 '14 at 13:39
  • First, I used wrong term, I meant "rehash". Sorry about that :) Will split answer in two comments, here is the HashMap javadoc quotation: – Alexey Malev Apr 02 '14 at 13:40
  • The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets. – Alexey Malev Apr 02 '14 at 13:41
  • +1 I know understand you. Will maps be rehashed without inserting new elements? – Pablo Francisco Pérez Hidalgo Apr 02 '14 at 13:42
  • No, map won't be rehashed without changing its contents. I suppose this is true also for different serialization/deserialization of the map. – Alexey Malev Apr 02 '14 at 13:43
  • @PabloFranciscoPérezHidalgo the implementation of the HashMap could be a [red-black tree](http://en.wikipedia.org/wiki/Red_black_tree), which self balances by rearranging elements sometimes when there are inserts or deletes to the tree. There are other rebalancing algorithms for hash trees as well, as described in chapter three of [_Introduction to Algorithms_](http://www.amazon.com/Introduction-Algorithms-Thomas-H-Cormen/dp/0262033844/ref=sr_1_1?ie=UTF8&qid=1396446185&sr=8-1&keywords=introduction+to+algorithms). – Michael Shopsin Apr 02 '14 at 13:43
  • @AlexeyMalev but that should only happen when new elements are inserted and the insertion process is the same for the two maps so this rehash, if it is deterministic, should result in two identical maps, no? – Pablo Francisco Pérez Hidalgo Apr 02 '14 at 13:44
  • @PabloFranciscoPérezHidalgo, in general, yes, but as I wrote above, this is not a part of HashMap public contract and thus unreliable. – Alexey Malev Apr 02 '14 at 13:47
  • @AlexeyMalev I absolutely agree – Pablo Francisco Pérez Hidalgo Apr 02 '14 at 13:48
  • 1
    @MichaelShopsin That is exactly the implementation of TreeMap (http://docs.oracle.com/javase/7/docs/api/java/util/TreeMap.html) not HashMap – Pablo Francisco Pérez Hidalgo Apr 02 '14 at 14:05
  • @PabloFranciscoPérezHidalgo mind if I ask you to mark my answer as correct if you rate it as correct? – Alexey Malev Apr 02 '14 at 14:10
  • @AlexeyMalev I accepted your answer because of your comments. – Pablo Francisco Pérez Hidalgo Apr 02 '14 at 14:12
  • @PabloFranciscoPérezHidalgo You are correct that TreeMap uses red-black trees not HashMap. I didn't read the JavaDocs before I wrote about HashMap, shame on me. That said HashMap could use an algorithm that does not produce identical results for the same inserts, or someone could use the JVM where optimizations produced the same effect. – Michael Shopsin Apr 02 '14 at 15:49
2

I think you are asking "Is HashMap non-deterministic?". The answer is "probably not" (look at the source code of your favourite implementation to find out).

However, bear in mind that because the Java standard does not guarantee a particular order, the implementation is free to alter at any time (e.g. in newer JRE versions), giving a different (yet deterministic) result.

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
0

Whether or not that is true is entirely dependent upon the implementation. What's more important is that it isn't guaranteed. If you order is important to you there are options. You could create your own implementation of Map that does preserve order, you can use a SortedMap/LinkedHashMap or you can use something like the apache commons-collections OrderedMap: http://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/OrderedMap.html.

jgitter
  • 3,396
  • 1
  • 19
  • 26
  • As I stated in my question that isn't nothing I would cont on, just being curious about the reproducibility of the HashMap state. – Pablo Francisco Pérez Hidalgo Apr 02 '14 at 13:37
  • Which I answered. The HashMap API does not guarantee any ordering. Some implementations might move keys between buckets to make reads faster, for instance. If the order is not established via some other mechanism, such as the one used by the LinkedHashMap, there is no guarantee what order you'll iterate through them. – jgitter Apr 02 '14 at 14:27