101

Since both std::priority_queue and std::set (and std::multiset) are data containers that store elements and allow you to access them in an ordered fashion, and have same insertion complexity O(log n), what are the advantages of using one over the other (or, what kind of situations call for the one or the other?)?

While I know that the underlying structures are different, I am not as much interested in the difference in their implementation as I am in the comparison their performance and suitability for various uses.

Note: I know about the no-duplicates in a set. That's why I also mentioned std::multiset since it has the exactly same behavior as the std::set but can be used where the data stored is allowed to compare as equal elements. So please, don't comment on single/multiple keys issue.

penelope
  • 8,251
  • 8
  • 45
  • 87
  • 21
    The priority queue only offers access to the *largest* element, while the set gives you a complete ordering of *all* elements. This weaker interface means that implementations may be more efficient (e.g. you can store the actual queue data in a `vector`, which may have better performance on account of its memory locality). – Kerrek SB Apr 13 '12 at 13:39
  • 1
    @KerrekSB The most elaborated answer is actually a comment :D Nobody has commented on performance at all. Could you put that in to an answer, maybe expand a little? – penelope Apr 13 '12 at 13:41
  • The key standard library point is that `priority_queue` is implemented in terms of the `heap*`-algorithms from ``, applied to an underlying random-access container. – Kerrek SB Apr 13 '12 at 13:58
  • @KerrekSB would be great if you put all of that in an answer – penelope Apr 13 '12 at 14:24

4 Answers4

81

A priority queue only gives you access to one element in sorted order -- i.e., you can get the highest priority item, and when you remove that, you can get the next highest priority, and so on. A priority queue also allows duplicate elements, so it's more like a multiset than a set. [Edit: As @Tadeusz Kopec pointed out, building a heap is also linear on the number of items in the heap, where building a set is O(N log N) unless it's being built from a sequence that's already ordered (in which case it is also linear).]

A set allows you full access in sorted order, so you can, for example, find two elements somewhere in the middle of the set, then traverse in order from one to the other.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • 13
    Another difference is that building priority queue from given set of values has only linear complexity. – Tadeusz Kopec for Ukraine Apr 13 '12 at 13:43
  • Performance wise, I found multiset performing better than a priority queue when simulating the behaviour of a use case we have. In our real world application either will perform fine, but the richer features of a set are important so overall that makes it a winner. YMMV, but I suspect a multiset is the better choice in most cases. – Nick Jun 20 '15 at 21:40
  • @TadeuszKopec Using `emplace_hint` and `insert` with hint iterator one can achieve linear complexity for sorted input too. – Tomilov Anatoliy May 16 '16 at 19:01
75

std::priority_queue allows to do the following:

  1. Insert an element O(log n)
  2. Get the smallest element O(1)
  3. Erase the smallest element O(log n)

while std::set has more possibilities:

  1. Insert any element O(log n) and the constant is greater than in std::priority_queue
  2. Find any element O(log n)
  3. Find an element, >= than the one your are looking for O(log n) (lower_bound)
  4. Erase any element O(log n)
  5. Erase any element by its iterator O(1)
  6. Move to previous/next element in sorted order O(1)
  7. Get the smallest element O(1)
  8. Get the largest element O(1)
mrazimi
  • 337
  • 1
  • 3
  • 12
Ixanezis
  • 1,631
  • 13
  • 20
  • 1
    Or maybe moving to the previous/next element in sorted order works in `O(log n)` - don't know myself :( – Ixanezis Apr 13 '12 at 14:08
  • For set, get the smallest and largest element, should it be O(1) or O(log n). This answer contradicts with Andrew Tomazos's answer. Which is correct? – Robert Wang Sep 05 '16 at 20:29
  • 4
    For set, picking the smallest element is effectively an `*s.begin()`, and the largest element is an `*s.rbegin()`, so since both functions have constant complexity, I believe `O(1)` is correct. http://en.cppreference.com/w/cpp/container/set/begin – Ixanezis Sep 09 '16 at 11:19
  • 2
    If set is constructed as a BST, how could we find the begin() in O(1) time? similar question goes to rbegin(). Unless we have two extra O(1) space to track the max and min values (I am not sure if this is the case in STL). – Robert Wang Sep 10 '16 at 00:38
  • 7
    @RobertWang [This is the case in STL](https://github.com/gcc-mirror/gcc/blob/8e8f6434760cfe2a1c6c9644181189fdb4d987bb/libstdc%2B%2B-v3/include/bits/stl_tree.h#L759). So `O(1)` is right answer. – Tomilov Anatoliy Feb 21 '18 at 19:18
  • 1
    Note that the priority queue returns the `top()` element, which, by default, happens to be the largest (it uses `std::less<>`, so smallest to largest and the last one is considered to be the `top()`). You can flip that using the `std::greater<>` or implement your `operator<` to work in the order you'd like. – Alexis Wilke Sep 27 '22 at 01:23
40

set/multiset are generally backed by a binary tree. http://en.wikipedia.org/wiki/Binary_tree

priority_queue is generally backed by a heap. http://en.wikipedia.org/wiki/Heap_(data_structure)

So the question is really when should you use a binary tree instead of a heap?

Both structures are laid out in a tree, however the rules about the relationship between anscestors are different.

We will call the positions P for parent, L for left child, and R for right child.

In a binary tree L < P < R.

In a heap P < L and P < R

So binary trees sort "sideways" and heaps sort "upwards".

So if we look at this as a triangle than in the binary tree L,P,R are completely sorted, whereas in the heap the relationship between L and R is unknown (only their relationship to P).

This has the following effects:

  • If you have an unsorted array and want to turn it into a binary tree it takes O(nlogn) time. If you want to turn it into a heap it only takes O(n) time, (as it just compares to find the extreme element)

  • Heaps are more efficient if you only need the extreme element (lowest or highest by some comparison function). Heaps only do the comparisons (lazily) necessary to determine the extreme element.

  • Binary trees perform the comparisons necessary to order the entire collection, and keep the entire collection sorted all-the-time.

  • Heaps have constant-time lookup (peek) of lowest element, binary trees have logarithmic time lookup of lowest element.

Andrew Tomazos
  • 66,139
  • 40
  • 186
  • 319
  • This is not elaborated at all. What I asked for were different situations in which you would prefer to use one over the other. – penelope Apr 13 '12 at 13:37
  • 1
    Just posting a half written answer to be the first one to answer a Q is not really nice in my opinion. – penelope Apr 13 '12 at 13:45
  • @penelope: I thought an immediate short answer would be more useful to you in the interim than waiting for a long answer. – Andrew Tomazos Apr 13 '12 at 13:46
  • Can you elaborate on turning an unsorted array into a heap in `O(n)` when using `priority_queue`. According to this: http://www.cplusplus.com/reference/algorithm/push_heap/ `push_heap` is `O(log(n))`, so pushing elements to a priority queue one-by-one would seem to take `O(n log(n))`. Perhaps if you build the heap all at once it's faster. Is it possible to construct a `priority_queue` from an unsorted set in `O(n)` time? – cheshirekow Jun 23 '13 at 15:10
  • Scratch my last comment: the constructor taking iterators is evidently `O(n)` – cheshirekow Jun 23 '13 at 15:12
  • @cheshirekow: The function you are interested in is called `std::make_heap` and runs in linear time. – Andrew Tomazos Jun 23 '13 at 15:31
  • 5
    Peek at lowest element is O(1) for multiset (i.e. *begin() ) – TemplateRex Oct 17 '16 at 11:14
16

Since both std::priority_queue and std::set (and std::multiset) are data containers that store elements and allow you to access them in an ordered fashion, and have same insertion complexity O(log n), what are the advantages of using one over the other (or, what kind of situations call for the one or the other?)?

Even though insert and erase operations for both containers have the same complexity O(log n), these operations for std::set are slower than for std::priority_queue. That's because std::set makes many memory allocations. Every element of std::set is stored at its own allocation. std::priority_queue (with underlying std::vector container by default) uses single allocation to store all elements. On other hand std::priority_queue uses many swap operations on its elements whereas std::set uses just pointers swapping. So if swapping is very slow operation for element type, using std::set may be more efficient. Moreover element may be non-swappable at all.

Memory overhead for std::set is much bigger also because it has to store many pointers between its nodes.

Dev Null
  • 4,731
  • 1
  • 30
  • 46
anton_rh
  • 8,226
  • 7
  • 45
  • 73