76

When to use hashmaps or treemaps?

I know that I can use TreeMap to iterate over the elements when I need them to be sorted. But is just that? There is no optimization when I just want to consult the maps, or some optimal specific uses?

BenMorel
  • 34,448
  • 50
  • 182
  • 322
JeanK
  • 1,765
  • 1
  • 15
  • 22
  • 3
    there is a significant difference between the Maps. `TreeMap` is a `NavigableMap` (and sorted). It has some interesting properties, like finding the next/prev value/key based on current key, however requires [natural] order of the keys, while HashMap is... well hash based, so direct comparison between is not very proper. If you don't have defined natural order (think of like java.util.URL) you can't really use TreeMap. Also, object compare is generally shower than hashcode/equals. – bestsss Mar 16 '11 at 18:44
  • Many thanks, good answers that are complementary. Its much more clear to me when to use them. Thanks for the links as well. – JeanK Mar 17 '11 at 02:16
  • You can see the [enter link description here][1] [1]: http://stackoverflow.com/questions/2444359/what-is-the-difference-hashmap-and-treemap – fxl545826 Jul 27 '13 at 09:01

5 Answers5

53

TreeMap provides guaranteed O(log n) lookup time (and insertion etc), whereas HashMap provides O(1) lookup time if the hash code disperses keys appropriately.

Unless you need the entries to be sorted, I'd stick with HashMap. Or there's ConcurrentHashMap of course. I can't remember the details of the differences between all of them, but HashMap is a perfectly reasonable "default" option :)

For completeness, I should point out that there was a discussion on Stack Overflow a month or so ago about the internals of various maps. See the comments in this question, which I will copy into this answer if bestsss is happy for me to do so.

Community
  • 1
  • 1
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 2
    I suppose this applies more to "`TreeSet` vs `HashSet`", but suppose I do frequent set intersections. In that case, having the elements sorted is usually a net win, even if you don't otherwise make explicit use of their sortedness. – phooji Mar 16 '11 at 17:51
  • @phooji: That makes sense, yes, assuming the intersection code is suitably optimized. As you say, it's less likely to be relevant for maps. – Jon Skeet Mar 16 '11 at 17:52
  • @Jon, I guess I was that person (pointing the weakness of HashMap in java) – bestsss Mar 16 '11 at 17:54
  • @bestsss: Could well be. Do you remember where the discussion was? And if we can find it, do you mind if I just post it straight into this answer? – Jon Skeet Mar 16 '11 at 17:58
  • @bestsss: Found it: http://stackoverflow.com/questions/4815869/how-to-get-keys-in-map-with-same-sequence-as-they-were-inserted - same question still applies :) – Jon Skeet Mar 16 '11 at 17:58
  • @Jon, I'd put `LinkedHashMap` as default `Map`. It has better and predictable iteration (as LinkedList) that doesn't change w/ java version or hashmap impl. – bestsss Mar 16 '11 at 18:05
49

Hashtables (usually) perform search operations (look up) bounded within the complexity of O(n)<=T(n)<=O(1), with an average case complexity of O(1 + n/k); however, binary search trees, (BST's), perform search operations (lookup) bounded within the complexity of O(n)<=T(n)<=O(log_2(n)), with an average case complexity of O(log_2(n)). The implementation for each (and every) data structure should be known (by you), to understand the advantages, drawbacks, time complexity of operations, and code complexity.

For example, the number of entries in a hashtable often have some fixed number of entries (some part of which may not be filled at all) with lists of collisions. Trees, on the other hand, usually have two pointers (references) per node, but this can be more if the implementation allows more than two child nodes per node, and this allows the tree to grow as nodes are added, but may not allow duplicates. (The default implementation of a Java TreeMap does not allow for duplicates)

There are special cases to consider as well, for example, what if the number of elements in a particular data structure increases without bound or approaches the limit of an underlying part of the data structure? What about amortized operations that perform some rebalancing or cleanup operation?

For example, in a hashtable, when the number of elements in the table become sufficiently large, and arbitrary number of collisions can occur. On the other hand, trees usually require come re-balancing procedure after an insertion (or deletion).

So, if you have something like a cache (Ex. the number of elements in bounded, or size is known) then a hashtable is probably your best bet; however, if you have something more like a dictionary (Ex. populated once and looked up many times) then I'd use a tree.

This is only in the general case, however, (no information was given). You have to understand process that happen how they happen to make the right choice in deciding which data structure to use.

When I need a multi-map (ranged lookup) or sorted flattening of a collection, then it can't be a hashtable.

Ivan
  • 10,052
  • 12
  • 47
  • 78
ony
  • 12,457
  • 1
  • 33
  • 41
  • 15
    I disagree with your example of using a tree for a structure that will be populated once and referenced many times. In this scenario you can create an appropriately sized Hashtable that has an O(1) lookup performance vs. O(log n) for a tree based structure. Furthermore you'll never use the linear operations of a `Hashtable`. – Chris Kerekes Jul 13 '12 at 15:49
  • @ChrisKerekes, You can create hashtable that will suite perfectly and there will be no collisions that makes it O(m) and get that promised O(1). But you'll need to generate good hash function specific for you data (not always you can spend few empty extra-entries to change hash function behavior). Otherwise general hashtables is more suited for fast peace cache, I think. – ony Jul 14 '12 at 08:40
  • 2
    Is that `O(n)<=T(n)<=O(1)` correct? – f.ardelian Aug 01 '13 at 23:27
  • 2
    @f.ardelian, I don't understand that too. @alvonellos promise that those are technical details. Probably it should be `O(1) ≤ T(n) ≤ O(n)` for hash-table and `T(n) = O(log n)` for balanced tree (as I remember constants can be stripped out of O-notation). – ony Dec 15 '13 at 18:02
  • Here is a performance test with some numbers based upon the collection size: http://www.artima.com/weblogs/viewpost.jsp?thread=122295 – Simone Avogadro Jul 28 '15 at 08:36
11

The largest difference between the two is the underlying structure used in the implementation.

HashMaps use an array and a hashing function to store elements. When you try to insert or delete an item in the array the hashing function converts the key into an index on the array where the object is/should be stored (ignoring conflicts). While hashmaps are generally very fast because they don't need to iterate over large amounts of data, they slow down when they're filled because they need to copy all the key/values into a new array.

TreeMaps store a the data in a sorted tree structure. While this means that they'll never have to allocate more space and copy over to it, operations require that part of the data already stored be iterated over. Sometimes changing large amounts of the structure.

Out of the two Hashmaps will generally have better performance when you don't need sorting.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Chris Kerekes
  • 1,116
  • 8
  • 27
4

Inserting new elements into a HashMap will, on average, be a good deal faster than inserting elements into a TreeMap. Unless you need your elements sorted, I'd go with the HashMap.

überjesus
  • 824
  • 7
  • 9
4

Don't forget there is also LinkedHashMap which is nearly as fast as HashMap for add/contains/remove operations but also maintains the insertion order.

z7sg Ѫ
  • 3,153
  • 1
  • 23
  • 35
  • 2
    `contains` is exactly as fast, iteration of LinkedHashMap is faster (quite a bit). – bestsss Mar 16 '11 at 18:22
  • Hmmmm yes maybe I should use it more often! – z7sg Ѫ Mar 16 '11 at 18:28
  • `get()` is practically as fast unless it has created w/ `accessOrder=true` and then it is just a few (6) assignments and one indirection (close to nothing). LinkedHashMap is my favorite structure in java.util. – bestsss Mar 16 '11 at 18:39