2

I was reading the documentation for priority_queue (basically a wrapper over make_heap), and it found out that you can customize it with a compare function.

From the documentation(http://en.cppreference.com/w/cpp/container/priority_queue):

A user-provided Compare can be supplied to change the ordering, e.g. using std::greater would cause the smallest element to appear as the top().

In Wikipedia and other CS texts a heap is defined as such:

In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: If A is a parent node of B then the key (the value) of node A is ordered with respect to the key of node B with the same ordering applying across the heap. A heap can be classified further as either a "max heap" or a "min heap". In a max heap, the keys of parent nodes are always greater than or equal to those of the children and the highest key is in the root node.

But in the std::priority_queue implementation, supplying std::greater would cause a MinHeap to be created (smallest element at the top). I would have expected a max heap since a heap ordering is defined (in all the literature that I've read) if (parent comparator children) is true.

I find this kind of API design confusing.

Was there a reason to define it this way?

Alex
  • 600
  • 1
  • 5
  • 16
  • Duplicate of http://stackoverflow.com/questions/32748069/the-reason-of-using-stdgreater-for-creating-min-heap-via-priority-queue ? – Arunmu Jun 14 '16 at 20:56

2 Answers2

3

This is something to get your head around - supplying std::greater will actually produce min heap.

The reason for this is following: from the heap perspective, comparator defines lesser than relation on the heap, even if it is actualy doing something else. When std::priority_queue needs to insert elements, it calls the comparator and gives it two elements. If comparator returns true, it considers the first element to be smaller than the second, and put's the second element first (since std::priority_queue is a max heap implementation). As a result, you end up with min heap.

SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • I understand how it's implemented, but why was it defined this way? Why doesn't it just use the comparator to verify the parent/children relationship like it is defined almost everywhere? – Alex Jun 14 '16 at 18:12
  • Why does a max_heap need std::less<> instead of std::greater<>? It just needed to switch the order of the 2 elements in the if, and all would work - generate a maxHeap with std::greater. My first guess was that it's some kind of an optimization, but I'm not sure that is the actual reasoning behind it. For example the std::sort APIs work the other (intuitive) way around, if the comparator is true, things remain in place... – Alex Jun 14 '16 at 18:25
  • 2
    @AlexandruEne std::greater is the reverse-order less. I assume that C++ implemented a max-heap by default because it makes heap-sort slightly easier (move the greatest element to the end of the array, which is the newly opened slot). – David Eisenstat Jun 14 '16 at 18:28
  • @AlexandruEne: "*I understand how it's implemented*" And yet, your description of the behavior was wrong, as Sergey pointed out. – Nicol Bolas Jun 14 '16 at 18:48
  • In the question I described my expectations based on literature, not the implementation, and I was trying to understand "why this implementation was chosen" since it leads in my opinion to a confusing interface. What was the reason behind it? Why doesn't the comparator method define the parent-children relationship? I can see what the code is doing, but I didn't understand why this route and design was chosen. – Alex Jun 15 '16 at 07:42
  • @AlexandruEne, comparator is to return true when first element is less than another because this is how every other comparator works in the library. Why `priority_queue` is a max-heap implementation is a different question, and there is a guess about the reasoning in another answer. – SergeyA Jun 15 '16 at 12:58
3

I find it confusing too, but from a somewhat different perspective. Suppose that you want to sort a sequence in ascending order. There are several ways of doing it:

  1. Put your data in a std::vector or std::deque and run std::sort() using std::less.
  2. Put your data in a std::list and run std::list::sort() using std::less.
  3. Insert your data in a std::set that is configured with std::less and in the end it is automatically sorted.
  4. Put your data in a std::vector or std::deque and run std::make_heap() followed by std::pop_heap()-s using std::less.
  5. Push your data through a std::priority_queue using std::greater (!!!).

As we can see, std::priority_queue is a definite outlier from such a point of view.

Actually, the reason behind the confusing behavior of std::priority_queue in this regard is hidden in item (4) since that is what a std::priority_queue does underneath. (4) is also against my intuition (though to a lesser extent), since in the intermediate state (while not all std::pop_heap have been performed) the sorted part of the sequence is in its upper range, rather than the lower range.

But it also explains why a max heap was selected for the standard library - std::pop_heap puts the popped element in the location, from where it can be removed in constant time, regardless of the container type used.

Leon
  • 31,443
  • 4
  • 72
  • 97