3

I require a map that supports 3 operations: "insert", "remove" and "iterate in sorted order". This is exactly the interface of TreeMap in Java. That being said it can also be implemented by using a HashMap and sorting it every time before iteration. To analyze the different approaches, lets say I perform n inserts and m removes, 'r' reads and then iterate.

With TreeMap we have the following implementation:

TreeMap<Integer, Integer> tm = Maps.newTreeMap();
for (int i=0;i<n;++i) {tm.put(i, 2*i);} // O(n*log(n))
for (int i=0;i<m;++i) {tm.remove(i);} // O(m*log(m))
for (int i=0;i<r;++i) {tm.get(i);} // O(r*log(n-m))
for (Integer i : tm) {print(i);} // O(n-m)

All told we have a total run time of O(n*log(n) + m*log(m) + r*log(n-m))

With HashMap we have the following implementation:

HashMap<Integer, Integer> hm = Maps.newHashMap();
for (int i=0;i<n;++i) {hm.put(i, 2*i);} // O(n)
for (int i=0;i<m;++i) {hm.remove(i);} // O(m)
for (int i=0;i<r;++i) {hm.get(i);} // O(r)
List<Integer> sortedList = Lists.newArrayList(hm.keySet()); // O(n-m)
Collections.sort(sortedList); // O((n-m)*log(n-m))
for (Integer i : sortedList) {print(i);} // O(n-m)

All told we have a total run time of O((n-m)*log(n-m)).

For all n,m O(n*log(n) + m*log(m) + r*log(n-m)) > O((n-m)*log(n-m)).

My question therefore is, what is the use case where a TreeMap is better than a HashMap? Is it only better if you need to iterate over the map many (let's say k) times (in which case, if k is >> log(n) the run time for TreeMap will be O(k*(n-m)) whereas for HashMap will be O(k*(n-m)*log(n-m)))? Regardless, if you are only performing O(log(n)) iterations (this does sound like such a sane use case), HashMap will outperform TreeMap. Am I missing something?

Benjy Kessler
  • 7,356
  • 6
  • 41
  • 69
  • 2
    http://stackoverflow.com/questions/302371/which-data-structure-would-you-use-treemap-or-hashmap-java – sunysen Feb 03 '15 at 14:10
  • 2
    http://stackoverflow.com/questions/5329358/treemap-or-hashmap – sunysen Feb 03 '15 at 14:10
  • That is for a specific use case. My point is, does there exist a use case where TreeMap is better than HashMap. – Benjy Kessler Feb 03 '15 at 14:11
  • The second question just explains the difference. I know the difference between the two kinds of Map. I want to know why TreeMap exists if it is always less efficient. – Benjy Kessler Feb 03 '15 at 14:13
  • If you need to iterate from one specific entry onwards, then you should use the `TreeMap`. – fps Feb 03 '15 at 14:18
  • @BenjyKessler Have you tried to read accepted answers in the linked questions? [Answer by Jon Skeet](http://stackoverflow.com/a/302378/451518) mentions `hash-then-sort` approach. [Another answer](http://stackoverflow.com/a/5329585/451518) presents a use case for treemap - dictionary that is looked up thousands times after each update. – default locale Feb 03 '15 at 14:20
  • Not true! It is better to copy the map into a list and sort it. – Benjy Kessler Feb 03 '15 at 14:21
  • Jon Skeet's answer claims that the only benefit of TreeMap is code readability. If that is true then you could implement TreeMap as a HashMap whose iterator sorts the collection on construction. – Benjy Kessler Feb 03 '15 at 14:26
  • @BenjyKessler Just be aware that all collection views are *write-through* so if you copy everything into an array and sort it, you'll have a lot of work to do to maintain those semantics. – Marko Topolnik Feb 03 '15 at 14:29
  • Yeah, I don't like the iterator idea very much either. I'm just saying that if the only reason to use TreeMap is that the usage of HashMap is unreadable (if much more efficient). I think the correct solution would be to make HashMap more readable and not accept the performance hit. – Benjy Kessler Feb 03 '15 at 14:31
  • I would certainly *not* prefer `HashMap` just for readability. Proper code design can easily provide readability in any case. – Marko Topolnik Feb 03 '15 at 14:33
  • @BenjyKessler Marko Topolnik's answer already provided nice use case for TreeMap: lookup by range. `HashMap` can only search for exact match. – default locale Feb 03 '15 at 14:35

2 Answers2

7

Of course there exist such use cases. In all read-heavy settings you have the advantage of sorting only once, during insertion. The majority of use cases are read-heavy, contrary to the assumptions of your question.

An even greater advantage is offered by the TreeMap when you need to extract submaps with an upper or lower bound on the key, find the minimum or maximum keys, or find keys closest to a given key. The interface NavigableMap is dedicated to these operations.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • I edited my question to take into account reads. Reads in a HashMap are O(1) whereas in TreeMaps they run in O(log(n)). – Benjy Kessler Feb 03 '15 at 14:19
  • We are discussing iteration here, so by "read" I mean "iterate over the map". – Marko Topolnik Feb 03 '15 at 14:20
  • Is that true? What application iterates over a Map more often then inserts, removes and gets? I think get will be the most common operation, not iteration. If iteration was the most common operation then you are starting off with an O(k*n) > O(n^2) algorithm which is already unwieldy. – Benjy Kessler Feb 03 '15 at 14:23
  • I can see why NavigableMap is useful, and if you want to implement it using a tree then that's fine. My question is why is TreeMap useful. – Benjy Kessler Feb 03 '15 at 14:29
  • 1
    `TreeMap implements NavigableMap`. – Marko Topolnik Feb 03 '15 at 14:29
  • In one sentence you refer to the whole *application*, then bait-and-switch to an *algorithm*. An application serves requests and typically it serves many requests by referring to the state built up in a `TreeMap`, which gets updated occasionally. – Marko Topolnik Feb 03 '15 at 14:31
  • OK, I will grant that for ranges it is useful but it should state in the documentation that this class is only useful if you need to manage ranges. – Benjy Kessler Feb 03 '15 at 14:39
2

An obvious use case is when you want to sort the map according to some Comparator definition. It's not always about performance.

  • 1
    OP has already covered that: use `HashMap` and sort before iteration. – Marko Topolnik Feb 03 '15 at 14:23
  • You can sort `map,keySet()` with your custom comparator. – default locale Feb 03 '15 at 14:23
  • 1
    My point is that the OP is focusing just on the performance aspects. In some cases, it's not performance that is the overarching requirement but rather holding the data in some ordered way. –  Feb 03 '15 at 14:25
  • Nobody cares how you *hold* the data; we care about what you can *do* with it. If i gave you an implementation which had `HashMap` internally and used array sorting before returning a `keySet`, you would be as happy with it as with a `TreeMap`. – Marko Topolnik Feb 03 '15 at 14:27
  • @MarkoTopolnik, I don't think you can see the wood from the trees here. –  Feb 03 '15 at 14:30
  • I find your comment unconstructive. You have just expressed an opinion without contributing any further arguments. – Marko Topolnik Feb 03 '15 at 14:38