31

When should I choose one over the other? Are there any pointers that you would recommend for using the right STL containers?

gsamaras
  • 71,951
  • 46
  • 188
  • 305
kal
  • 28,545
  • 49
  • 129
  • 149
  • 1
    There is no `hashset` in STL. There will be `unordered_set` in C++ (hopefully) next year. – avakar Mar 25 '10 at 18:28
  • Talking about this http://www.sgi.com/tech/stl/hash_set.html – kal Mar 25 '10 at 18:31
  • @mkal: That's not part of the standard STL. `hash_set` is an extension. – kennytm Mar 25 '10 at 18:38
  • 1
    There is a discussion related to this here: [http://stackoverflow.com/questions/222658/multiset-map-and-hash-map-complexity](http://stackoverflow.com/questions/222658/multiset-map-and-hash-map-complexity) Edit: Adam Rosenfield's answer on that question lays out the big-O for each. – itsmatt Mar 25 '10 at 18:33
  • What is the "standard STL"? – Lightness Races in Orbit Apr 23 '16 at 17:22

5 Answers5

50

hash_set is an extension that is not part of the C++ standard. Lookups should be O(1) rather than O(log n) for set, so it will be faster in most circumstances.

Another difference will be seen when you iterate through the containers. set will deliver the contents in sorted order, while hash_set will be essentially random (Thanks Lou Franco).

Edit: The C++11 update to the C++ standard introduced unordered_set which should be preferred instead of hash_set. The performance will be similar and is guaranteed by the standard. The "unordered" in the name stresses that iterating it will produce results in no particular order.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
23

stl::set is implemented as a binary search tree. hashset is implemented as a hash table.

The main issue here is that many people use stl::set thinking it is a hash table with look-up of O(1), which it isn't, and doesn't have. It really has O(log(n)) for look-ups. Other than that, read about binary trees vs hash tables to get a better idea of the data structures.

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
Alex
  • 6,843
  • 10
  • 52
  • 71
  • Why did this get downvoted? -- a red-black tree is a kind of binary search tree, and the spec doesn't say it has to be red-black -- it just set big-O parameters for the operations. – Lou Franco Mar 25 '10 at 18:54
  • @LouFranco: Probably because it discusses [possible (likely!)] implementations rather than standard-mandated behaviour/semantics. – Lightness Races in Orbit Apr 23 '16 at 17:24
4

Another thing to keep in mind is that with hash_set you have to provide the hash function, whereas a set only requires a comparison function ('<') which is easier to define (and predefined for native types).

ronys
  • 508
  • 1
  • 6
  • 9
3

I don't think anyone has answered the other part of the question yet.

The reason to use hash_set or unordered_set is the usually O(1) lookup time. I say usually because every so often, depending on implementation, a hash may have to be copied to a larger hash array, or a hash bucket may end up containing thousands of entries.

The reason to use a set is if you often need the largest or smallest member of a set. A hash has no order so there is no quick way to find the smallest item. A tree has order, so largest or smallest is very quick. O(log n) for a simple tree, O(1) if it holds pointers to the ends.

Zan Lynx
  • 53,022
  • 10
  • 79
  • 131
  • an unordered_set lookup or insertion will be from O(1) to O(n). There are no guarantees it will be closer to O(1). A (ordered) set lookup will be up to O(log n). It does not mean it will be closer to O(n) most of the time (like you hinted with your comment about the first and last possibly taking O(1)). Your comment about rehashing could use some clarification. Implementations are not forced to rehash when a lookup is done. A lookup would take O(n) when a node contains all entries and thus a linear lookup needs to be performed. – edwinc Sep 26 '18 at 18:20
2

A hash_set would be implemented by a hash table, which has mostly O(1) operations, whereas a set is implemented by a tree of some sort (AVL, red black, etc.) which have O(log n) operations, but are in sorted order.

Edit: I had written that trees are O(n). That's completely wrong.

Alex Gaynor
  • 14,353
  • 9
  • 63
  • 113