518

I've always loved trees, that nice O(n*log(n)) and the tidiness of them. However, every software engineer I've ever known has asked me pointedly why I would use a TreeSet. From a CS background, I don't think it matters all that much which you use, and I don't care to mess around with hash functions and buckets (in the case of Java).

In which cases should I use a HashSet over a TreeSet?

Monzurul Shimul
  • 8,132
  • 2
  • 28
  • 42
heymatthew
  • 6,716
  • 5
  • 26
  • 22

14 Answers14

880

HashSet is much faster than TreeSet (constant-time versus log-time for most operations like add, remove and contains) but offers no ordering guarantees like TreeSet.

HashSet

  • the class offers constant time performance for the basic operations (add, remove, contains and size).
  • it does not guarantee that the order of elements will remain constant over time
  • iteration performance depends on the initial capacity and the load factor of the HashSet.
    • It's quite safe to accept default load factor but you may want to specify an initial capacity that's about twice the size to which you expect the set to grow.

TreeSet

  • guarantees log(n) time cost for the basic operations (add, remove and contains)
  • guarantees that elements of set will be sorted (ascending, natural, or the one specified by you via its constructor) (implements SortedSet)
  • doesn't offer any tuning parameters for iteration performance
  • offers a few handy methods to deal with the ordered set like first(), last(), headSet(), and tailSet() etc

Important points:

  • Both guarantee duplicate-free collection of elements
  • It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal.
  • None of these implementations are synchronized. That is if multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally.
  • LinkedHashSet is in some sense intermediate between HashSet and TreeSet. Implemented as a hash table with a linked list running through it, however,it provides insertion-ordered iteration which is not same as sorted traversal guaranteed by TreeSet.

So a choice of usage depends entirely on your needs but I feel that even if you need an ordered collection then you should still prefer HashSet to create the Set and then convert it into TreeSet.

  • e.g. SortedSet<String> s = new TreeSet<String>(hashSet);
Yoon5oo
  • 496
  • 5
  • 11
sactiw
  • 21,935
  • 4
  • 41
  • 28
  • 4
    Maybe it'll make more sense if your example assign the new TreeSet to a SortedSet type. – tanyehzheng Jun 15 '11 at 04:10
  • What about memory consumption? – matdumsa Nov 14 '11 at 15:17
  • 2
    @matdumsa TreeSet has pointers in both children and parent directions, it will come at a cost, for sure. – ahmet alp balkan Nov 26 '11 at 11:33
  • 42
    It is only me that finds the affirmation "HashSet is much faster than TreeSet (constant-time versus log-time ...)" plainly wrong? First that this is about time-complexity, not absolute time, and O(1) can be in too many cases slower than O(f(N)). Second that O(logN) is "almost" O(1). I wouldn't be surprised if for many common cases a TreeSet outperformed a HashSet. – lvella Jun 21 '12 at 13:28
  • 25
    I just want to second Ivella's comment. time-complexity is *NOT* the same thing as running time, and O(1) is not always better than O(2^n). A perverse example illustrates the point: consider a hash set using a hash algorithm that took 1 trillion machine instructions to execute (O(1)) vs any common implementation of bubble sort (O(N^2) avg/worst) for 10 elements. Bubble sort will win every time. The point is algorithms classes teach everyone to think about approximations using time-complexity but in the real world the constant factors *MATTER* frequently. – Peter Oehlert Jul 05 '12 at 15:17
  • 21
    Perhaps it's just me, but isn't the advice to first add everything to a hashset, and then covert it to a treeset a horrible one? 1) Insertion in a hashset is only fast if you know the size of your dataset in advance, otherwise you pay a O(n) re-hashing, possibly multiple times. and 2) You pay for the TreeSet insertion anyway when converting the set. (with a vengeance, because iteration through a hashset is not terribly efficient) – TinkerTank Dec 20 '12 at 13:36
  • 5
    This advice is based on the fact that for a set, you must check to see if an item is a duplicate before adding it; therefore you'll save time eliminating the duplicates if you are using an hashset over a treeset. However, considering the price to pay for creating a second set for the non-duplicates, the percentage of duplicates should be really great to overcome this price and make it a time saver. And of course, this is for medium and large sets because for a small set, the treeset is possibly faster than a hashset. – SylvainL Dec 31 '12 at 10:28
  • 5
    @PeterOehlert: please provide a benchmark for that. I understand your point, but the difference between both sets does barely matter with small collection sizes. And as soon as the set grows to a point, where the implementation matters, log(n) is becoming a problem. In general are hash functions (even complex ones) magnitudes of order faster than several cache misses (which you have on huge trees for almost every accessed level) to find/access/add/modify the leaf. At least that is my experience with these two sets in Java. – Bouncner Jan 25 '13 at 19:09
  • 1
    This is probably not all that important but a HashSet, though constant time, is about 120ish byte codes executed for an insert. So arrays are better if list is small. (even linear searching!!!) Second: HashMap.reset() is so terrible that just calling new HashMap is significantly more efficient. (Just saying, Thanks MIT Battlecode) – ThePrimeagen Nov 13 '13 at 19:56
  • Complexity does NOT specify the relative performance of two different data structures. It specifies how a given data structure's performance changes as a function of n. Obvious, right there in the definition, but very often gotten wrong. The hash function for a large, variable length string is so CPU intensive that you can do a LOT of simple compares in the same amount of time. Kudos to the OP for pointing out that you MUST know the size of the final hashtable up front. Trees, by contrast, are entirely self-maintaining. They grow and shrink on demand, auto-magically. –  May 05 '14 at 23:47
  • I tend to think of hashes as lossy compression, because that's what they do - range reduction with collisions. Since open hashing turns into a collisions catastrophe when the load factor goes over 0.50, you have to add the cost of a rehash to the primary hash's cost in CPU cycles. If you know enough about your data to avoid hot spots, can live with non-uniform retrieval times, and a lot of wasted table space, hashes are generally faster for small keys. Best close to the bare metal, and poorer close to the user. They tend to quietly become maintenance problems as data slowly changes. –  May 05 '14 at 23:56
  • I was once advised to use a hashtable to hash a tuple of 28 variables - about 20 of which were double floats. Even with 64-bit ints, that's probably impracticable or outright impossible. Using tuples in the C++ STL you can do that with Maps (R-B trees) all day long with zero issues. –  May 06 '14 at 00:06
  • what about calculating an intersection - is any of these two preferable? – Tad Sep 15 '14 at 14:34
  • 1
    For HashSet -> TreeSet, JDK 8 code has a faster linear version of addAll if the input is a SortedSet, nevertheless if the input is a HashSet, then it just calls add() for each element. This is the same as doing an individual add() for every element directly over TreeSet. Hence I don't think that making a HashSet and then creating the TreeSet using the HashSet is actually faster than just using a TreeSet. I haven't done any time evaluations for the same, but it looks like it's just better to directly create a TreeSet and use it if you ultimately need the final form to be as a TreeSet. – Quin Aug 26 '15 at 08:47
  • Re your final paragraph. Would it be fair to say a Tree set is what you want if you want a **continuously** sorted collection – Richard Tingle Dec 08 '15 at 22:48
  • Set set= new HashSet<>(); set.add("A"); set.add("B"); set.add("D"); set.add("C"); System.out.println(set); – Gundamaiah Mar 26 '18 at 11:06
  • The above prints [A, B, C, D]. As Hashset doesn't maintain order but how this one will be printed in order. – Gundamaiah Mar 26 '18 at 11:07
  • @Gundamaiah change your set elements to: `Set set= new HashSet<>() {{add("AZ"); add("DV"); add("BY"); add("EU"); add("CX");}};` Then System.out.println(set); would print: `[EU, DV, CX, BY, AZ]` So you can see neither alphabetical order is maintained nor insertion order is maintained. – sactiw Jun 05 '20 at 06:50
  • Given the large amount of speculation done here, I've made a small test that you can see as [paiza.io project](https://paiza.io/projects/0OLLqZd3B9wwl8sGXELW1g?language=kotlin). – julien.giband Apr 15 '21 at 14:20
  • Edit: Given the large amount of speculation done here, I've made a small test that you can see as [paiza.io project](https://paiza.io/projects/0OLLqZd3B9wwl8sGXELW1g?language=kotlin). It builds some large sets of random strings, and then does a big number of searches in them, comparing `HashSet` and `TreeSet`. Then it uses various methods to sort the hashset. As it turns out, using a TreeSet is generally much slower, as expected. As for sorts, wrapping with a TreeSet works very well for medium-sized HashSets, but becomes significantly slower than just using `.stream().sort()` for big sets. – julien.giband Apr 15 '21 at 14:29
40

One advantage not yet mentioned of a TreeSet is that its has greater "locality", which is shorthand for saying (1) if two entries are nearby in the order, a TreeSet places them near each other in the data structure, and hence in memory; and (2) this placement takes advantage of the principle of locality, which says that similar data is often accessed by an application with similar frequency.

This is in contrast to a HashSet, which spreads the entries all over memory, no matter what their keys are.

When the latency cost of reading from a hard drive is thousands of times the cost of reading from cache or RAM, and when the data really is accessed with locality, the TreeSet can be a much better choice.

Sufian
  • 6,405
  • 16
  • 66
  • 120
Carl Andersen
  • 489
  • 4
  • 3
  • 3
    Can you demonstrate that _if two entries are nearby in the order, a TreeSet places them near each other in the data structure, and hence in memory_ ? – David Soroko May 10 '15 at 18:20
  • 8
    Quite irrelevant for Java. Elements of the set are Objects anyway and point somewhere else, so you're not saving much of anything. – Andrew G Jul 08 '15 at 01:17
  • 2
    Besides the other comments made about the lack of locality in Java generally, OpenJDK's implementation of `TreeSet`/`TreeMap` is not locality optimized. While it is possible to use a b-tree of order 4 to represent a red-black tree and thus improve locality and cache performance, that is not how the implementation works. Instead, each node stores a pointer to its own key, its own value, its parent, and its left and right child nodes, evident in the [JDK 8 source code for TreeMap.Entry](http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/TreeMap.java#l2048). – kbolino Apr 02 '20 at 21:52
28

Basing on lovely visual answer on Maps by @shevchyk here is my take:

╔══════════════╦═════════════════════╦═══════════════════╦═════════════════════╗
║   Property   ║       HashSet       ║      TreeSet      ║     LinkedHashSet   ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║              ║  no guarantee order ║ sorted according  ║                     ║
║   Order      ║ will remain constant║ to the natural    ║    insertion-order  ║
║              ║      over time      ║    ordering       ║                     ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║ Add/remove   ║        O(1)         ║     O(log(n))     ║        O(1)         ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║              ║                     ║   NavigableSet    ║                     ║
║  Interfaces  ║         Set         ║       Set         ║         Set         ║
║              ║                     ║    SortedSet      ║                     ║
╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣
║              ║                     ║    not allowed    ║                     ║
║  Null values ║       allowed       ║ 1st element only  ║      allowed        ║
║              ║                     ║     in Java 7     ║                     ║
╠══════════════╬═════════════════════╩═══════════════════╩═════════════════════╣
║              ║   Fail-fast behavior of an iterator cannot be guaranteed      ║
║   Fail-fast  ║ impossible to make any hard guarantees in the presence of     ║
║   behavior   ║           unsynchronized concurrent modification              ║
╠══════════════╬═══════════════════════════════════════════════════════════════╣
║      Is      ║                                                               ║
║ synchronized ║              implementation is not synchronized               ║
╚══════════════╩═══════════════════════════════════════════════════════════════╝
Community
  • 1
  • 1
kiedysktos
  • 3,910
  • 7
  • 31
  • 40
27

HashSet is O(1) to access elements, so it certainly does matter. But maintaining order of the objects in the set isn't possible.

TreeSet is useful if maintaining an order(In terms of values and not the insertion order) matters to you. But, as you've noted, you're trading order for slower time to access an element: O(log n) for basic operations.

From the javadocs for TreeSet:

This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).

bozo user
  • 241
  • 1
  • 6
  • 15
duffymo
  • 305,152
  • 44
  • 369
  • 561
23

1.HashSet allows null object.

2.TreeSet will not allow null object. If you try to add null value it will throw a NullPointerException.

3.HashSet is much faster than TreeSet.

e.g.

 TreeSet<String> ts = new TreeSet<String>();
 ts.add(null); // throws NullPointerException

 HashSet<String> hs = new HashSet<String>();
 hs.add(null); // runs fine
Valentin
  • 10,769
  • 2
  • 17
  • 27
SuReN
  • 341
  • 2
  • 7
  • 3
    ts.add(null) it will work fine in case of TreeSet if null is added as first Object in TreeSet. And any object added after that will give NullPointerException in the compareTo method of Comparator. – Shoaib Chikate Jan 29 '15 at 12:20
  • 2
    You really really should not be adding `null` to your set either way. – fluffy Mar 26 '16 at 20:03
  • `TreeSet badassTreeSet = new TreeSet(new Comparator() { public int compare(String string1, String string2) { if (string1 == null) { return (string2 == null) ? 0 : -1; } else if (string2 == null) { return 1; } else { return string1.compareTo(string2); } } }); badassTreeSet.add("tree"); badassTreeSet.add("asdf"); badassTreeSet.add(null); badassTreeSet.add(null); badassTreeSet.add("set"); badassTreeSet.add("tree"); System.out.println(badassTreeSet);` – Dávid Horváth Dec 09 '16 at 21:33
  • @ShoaibChikate Your statement is not accurate in my version of Java (Oracle Corporation 11.0.4+10-LTS). The first element inserted is always compared to itself, so a `NullPointerException` is thrown if the first element is `null`. – M. Justin Oct 06 '20 at 18:21
  • This is not strictly true. If the `TreeSet` were created with a comparator that allows nulls, then nulls can be added. Per [`TreeSet.add(E e)`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/TreeSet.html#add(E)): "Throws: NullPointerException - if the specified element is null and this set uses natural ordering, or its comparator does not permit null elements". This successfully adds `null` to a `TreeSet`: `new TreeSet<>(Comparator.nullsLast(Comparator.naturalOrder())).add(null);`. – M. Justin Oct 06 '20 at 18:24
13

The reason why most use HashSet is that the operations are (on average) O(1) instead of O(log n). If the set contains standard items you will not be "messing around with hash functions" as that has been done for you. If the set contains custom classes, you have to implement hashCode to use HashSet (although Effective Java shows how), but if you use a TreeSet you have to make it Comparable or supply a Comparator. This can be a problem if the class does not have a particular order.

I have sometimes used TreeSet (or actually TreeMap) for very small sets/maps (< 10 items) although I have not checked to see if there is any real gain in doing so. For large sets the difference can be considerable.

Now if you need the sorted, then TreeSet is appropriate, although even then if updates are frequent and the need for a sorted result is infrequent, sometimes copying the contents to a list or an array and sorting them can be faster.

Kathy Van Stone
  • 25,531
  • 3
  • 32
  • 40
11

If you aren't inserting enough elements to result in frequent rehashings (or collisions, if your HashSet can't resize), a HashSet certainly gives you the benefit of constant time access. But on sets with lots of growth or shrinkage, you may actually get better performance with Treesets, depending on the implementation.

Amortized time can be close to O(1) with a functional red-black tree, if memory serves me. Okasaki's book would have a better explanation than I can pull off. (Or see his publication list)

JasonTrue
  • 19,244
  • 4
  • 34
  • 61
7

HashSet implementations are, of course, much much faster -- less overhead because there's no ordering. A good analysis of the various Set implementations in Java is provided at http://java.sun.com/docs/books/tutorial/collections/implementations/set.html.

The discussion there also points out an interesting 'middle ground' approach to the Tree vs Hash question. Java provides a LinkedHashSet, which is a HashSet with an "insertion-oriented" linked list running through it, that is, the last element in the linked list is also the most recently inserted into the Hash. This allows you to avoid the unruliness of an unordered hash without incurring the increased cost of a TreeSet.

Joseph Weissman
  • 5,697
  • 5
  • 46
  • 75
4

Why have apples when you can have oranges?

Seriously guys and gals - if your collection is large, read and written to gazillions of times, and you're paying for CPU cycles, then the choice of the collection is relevant ONLY if you NEED it to perform better. However, in most cases, this doesn't really matter - a few milliseconds here and there go unnoticed in human terms. If it really mattered that much, why aren't you writing code in assembler or C? [cue another discussion]. So the point is if you're happy using whatever collection you chose, and it solves your problem [even if it's not specifically the best type of collection for the task] knock yourself out. The software is malleable. Optimise your code where necessary. Uncle Bob says Premature Optimisation is the root of all evil. Uncle Bob says so

Yoon5oo
  • 496
  • 5
  • 11
KRK Owner
  • 762
  • 7
  • 16
  • Even if you use your set through a `Set` reference, you immediately have to select a concrete class to instanciate it, and provide the required methods (equals, compare, hashcode). Ain't no optimization, only trying to make the appropriate choice so you don't have to change it later. – Michel Billaud Mar 25 '21 at 17:56
4

The TreeSet is one of two sorted collections (the other being TreeMap). It uses a Red-Black tree structure (but you knew that), and guarantees that the elements will be in ascending order, according to natural order. Optionally, you can construct a TreeSet with a constructor that lets you give the collection your own rules for what the order should be (rather than relying on the ordering defined by the elements' class) by using a Comparable or Comparator

and A LinkedHashSet is an ordered version of HashSet that maintains a doubly-linked List across all elements. Use this class instead of HashSet when you care about the iteration order. When you iterate through a HashSet the order is unpredictable, while a LinkedHashSet lets you iterate through the elements in the order in which they were inserted

3

Even after 11 years, nobody thought of mentioning a very important difference.

Do you think that if HashSet equals TreeSet then the opposite is true as well? Take a look at this code:

TreeSet<String> treeSet = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
HashSet<String> hashSet = new HashSet<>();
treeSet.add("a");
hashSet.add("A");
System.out.println(hashSet.equals(treeSet));
System.out.println(treeSet.equals(hashSet));

Try to guess the output and then hover below snippet for seeing what the real output is. Ready? Here you go:

false
true

That's right, they don't hold equivalence relation for a comparator that is inconsistent with equals. The reason for this is that a TreeSet uses a comparator to determine the equivalence while HashSet uses equals. Internally they use HashMap and TreeMap so you should expect this behavior with the mentioned Maps as well.

Originally answered

Aniket Sahrawat
  • 12,410
  • 3
  • 41
  • 67
1

Message Edit ( complete rewrite ) When order does not matter, that's when. Both should give Log(n) - it would be of utility to see if either is over five percent faster than the other. HashSet can give O(1) testing in a loop should reveal whether it is.

Nicholas Jordan
  • 638
  • 3
  • 6
1

A lot of answers have been given, based on technical considerations, especially around performance. According to me, choice between TreeSet and HashSet matters.

But I would rather say the choice should be driven by conceptual considerations first.

If, for the objects your need to manipulate, a natural ordering does not make sense, then do not use TreeSet.
It is a sorted set, since it implements SortedSet. So it means you need to override function compareTo, which should be consistent with what returns function equals. For example if you have a set of objects of a class called Student, then I do not think a TreeSet would make sense, since there is no natural ordering between students. You can order them by their average grade, okay, but this is not a "natural ordering". Function compareTo would return 0 not only when two objects represent the same student, but also when two different students have the same grade. For the second case, equals would return false (unless you decide to make the latter return true when two different students have the same grade, which would make equals function have a misleading meaning, not to say a wrong meaning.)
Please note this consistency between equals and compareTo is optional, but strongly recommended. Otherwise the contract of interface Set is broken, making your code misleading to other people, thus also possibly leading to unexpected behavior.

This link might be a good source of information regarding this question.

Marek Stanley
  • 1,175
  • 10
  • 6
-3
import java.util.HashSet;
import java.util.Set;
import java.util.TreeSet;

public class HashTreeSetCompare {

    //It is generally faster to add elements to the HashSet and then
    //convert the collection to a TreeSet for a duplicate-free sorted
    //Traversal.

    //really? 
    O(Hash + tree set) > O(tree set) ??
    Really???? Why?



    public static void main(String args[]) {

        int size = 80000;
        useHashThenTreeSet(size);
        useTreeSetOnly(size);

    }

    private static void useTreeSetOnly(int size) {

        System.out.println("useTreeSetOnly: ");
        long start = System.currentTimeMillis();
        Set<String> sortedSet = new TreeSet<String>();

        for (int i = 0; i < size; i++) {
            sortedSet.add(i + "");
        }

        //System.out.println(sortedSet);
        long end = System.currentTimeMillis();

        System.out.println("useTreeSetOnly: " + (end - start));
    }

    private static void useHashThenTreeSet(int size) {

        System.out.println("useHashThenTreeSet: ");
        long start = System.currentTimeMillis();
        Set<String> set = new HashSet<String>();

        for (int i = 0; i < size; i++) {
            set.add(i + "");
        }

        Set<String> sortedSet = new TreeSet<String>(set);
        //System.out.println(sortedSet);
        long end = System.currentTimeMillis();

        System.out.println("useHashThenTreeSet: " + (end - start));
    }
}
gli00001
  • 5
  • 1
  • 1
    The post said It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal. Set s = new TreeSet(hashSet); I am wondering why not Set s = new TreeSet() directly if we know it will be used for sorted iteration, so I made this comparison and the result showed which is quicker. – gli00001 Sep 26 '12 at 04:15
  • "In which cases would I want to use a HashSet over a TreeSet?" – Austin Henley Sep 26 '12 at 05:33
  • 1
    my point is, if you need ordering, use TreeSet alone is better than putting everything into HashSet then creating a TreeSet based on that HashSet. I do not see the value of HashSet + TreeSet at all from the original post. – gli00001 Oct 07 '12 at 21:55
  • @gli00001: you missed the point. If you don't *always* need your set of elements to be sorted, but are going to manipulate it rather often, then it's going to be worth it for you to use a hashset to benefit from the faster operations most of the time. For the *occasional* times where you need to process the elements in order, then just wrap with a treeset. It depends on your use case, but that's not that much of an incommon use case (and that probably assumes a set that doesn't contain too many elements and with complex ordering rules). – haylem Nov 15 '12 at 12:51