0

I have large number of strings, I need to print unique strings in sorted order. TreeSet stores them in sorted order but insertion time is O(Logn) for each insertion. HashSet takes O(1) time to add but then I will have to get list of the set and then sort using Collections.sort() which takes O(nLogn) (I assumes there is no memory overhead here since only the references of Strings will be copied in the new collection i.e. List). Is it fair to say overall any choice is same since at the end total time will be same?

noobCoder
  • 292
  • 2
  • 17
  • 2
    Important question: What fraction of the strings are duplicates? – Bohemian Apr 20 '16 at 23:06
  • @Nevado I could ask the very same: why the *upvote*? As per the tooltip on the downvote, downvote goes for the question either being unclear, useless to other people, badly stated (insufficient information etc.) or for the lack of OP's research effort. As a side note: OP asked an optimization-related question but hasn't a) done the profiling & testing himself, b) provided enough context. In 99% of the situations about which newbies ask, the difference between `TreeSet` and `HashSet` is neglibible. Also, the answer to the question is covered in almost *any* book on algorithms. –  Apr 21 '16 at 18:51
  • Possible duplicate of [Hashset vs Treeset](http://stackoverflow.com/questions/1463284/hashset-vs-treeset) –  Apr 21 '16 at 18:53
  • @Nevade also, it's a dupe of http://stackoverflow.com/questions/1463284/hashset-vs-treeset , http://stackoverflow.com/questions/31800701/should-i-use-a-hashset-or-a-treeset-for-a-very-large-dataset , http://stackoverflow.com/questions/20217414/what-is-the-main-difference-between-hashset-treeset-and-linkedhashset-hashmap , http://stackoverflow.com/questions/23168490/hashset-and-treeset-performance-test - and perhaps about 10 other directly related questions. –  Apr 21 '16 at 18:53

4 Answers4

1

That depends on how close you look. Yes, the asymptotic time complexity is O(n log n) in either case, but the constant factors differ. So it's not like one method can get a 100 times faster than the other, but it's certainly possible that one method is twice a fast as the other.

For most parts of a program, a factor of 2 is totally irrelevant, but if your program actually spends a significant part of its running time in this algorithm, it would be a good idea to implement both approaches, and measure their performance.

meriton
  • 68,356
  • 14
  • 108
  • 175
0

Measuring is the way to go, but if you're talking purely theoretically and ignoring read from after sorting, then consider for number of strings = x:

HashSet: x * O(1) add operations + 1 O(n log n) (where n is x) sort operation = approximately O(n + n log n) (ok, that's a gross oversimplification, but..)

TreeSet: x * O(log n) (where n increases from 1 to x) + O(0) sort operation = approximately O(n log (n/2)) (also a gross oversimplification, but..)

And continuing in the oversimplification vein, O(n + n log n) > O(n log (n/2)). Maybe TreeSet is the way to go?

Paul Hicks
  • 13,289
  • 5
  • 51
  • 78
  • 2
    The insertion into the tree set is `x * O(log n)`, not `x * O(n log n)`. It really boils down to implementation details, like how fast the hash function of the hash set is and how fast the sorting algorithm of `Collection.sort()` is (which is also dependent on the distribution of the data). – Philipp Apr 20 '16 at 22:49
  • Ah, that's not what the OP said. That skews my inductions :) – Paul Hicks Apr 20 '16 at 22:53
0

If you distinguish the total number of strings (n) and number of unique strings (m), you get more detailed results for both approaches:

Hash set + sort: O(n) + O(m log m)

TreeSet: O(n log m)

So if n is much bigger than m, using a hash set and sorting the result should be slightly better.

Stefan Haustein
  • 18,427
  • 3
  • 36
  • 51
-1

You should take into account which methods will be executed more frequently and base your decision on that.

Apart from HashSet and TreeSet you can use LinkedHashSet which provides better performance for sorted sets. If you want to learn more about their differences in performance I suggest your read 6 Differences between TreeSet HashSet and LinkedHashSet in Java

Nevado
  • 162
  • 7
  • 1
    LinkedHashSet does not provide better performance for sorted sets. It just keeps track of the insertion order (at a memory cost). – Stefan Haustein Apr 21 '16 at 00:06
  • Yes it does, in term of acces time to its elements than the `TreeSet`. Of course its performance its not better than the `HashSet` but you cannot assure any order in this last one – Nevado Apr 21 '16 at 10:37