2

I came across a question like this: Implement a queue with push(x), pop() and a pop_max() method.

the pop_max() function should pop the largest element following the FIFO rules.

e.g: Before pop_max(): front-> 2,3,4,5,1,5

After pop_max(): front-> 2,3,4,1,5

Below are some of my tries.

  1. implement it with basic queue, find the max element with an O(n) scan using a support queue.

    pop()/push() is O(1), pop_max() should be O(n).

  2. implement it with a double linked list and a monotonic stack.

    pop()/pop_max() is O(1), push() should be O(n).

Does somebody knows that what would be the way to do this with minimum time complexity? I've read this Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations, the method it provides does not seem to suit this scene.

sugarfree
  • 21
  • 2
  • For your (1.), the "basic queue" still needs to be implemented; you can implement it with a singly-linked list, and the pop_max operation can "cheat" in the sense that it doesn't respect the queue constraints. – Stef Oct 19 '21 at 13:34
  • 2
    You can implement a queue with a double linked list along with both a max heap that stores nodes in the linked list and a counter that keeps track of the frequency of values in the current queue. push of distinct elements would be logarithmic and constant for repeated ones, while pop would be constant. pop_max would be amortized logarithmic but there could be cases where many pop_maxes (linear to the number of pushes) are necessary to sync up the heap and the counter. – wLui155 Oct 19 '21 at 13:45
  • @wLui155 I got what you mean, but how could pop be constant? as you should update the max heap as well. – sugarfree Oct 19 '21 at 14:10
  • 2
    @sugarfree it can't. You can use the abstract data type you're trying to implement to sort by pushing the whole list and then repeatedly pop-maxing, so O(1) with generic comparable elements is impossible. – David Eisenstat Oct 19 '21 at 14:22
  • pop would remove the earliest element in the linked list and decrement the removed value in the counter. Because it doesn't make any modifications to the heap and is composed of two constant-time operations, it's also constant. At the same time, the tradeoff is that catching up the heap to the current state of the queue can be a bit slow (when pop_max is called after many normal push/pop operations), as a result. – wLui155 Oct 19 '21 at 14:24

2 Answers2

2

By request, here’s a somewhat detailed answer on why I believe that there’s a solution with worst-case O(1) pushes and pops and O(log n) pop-maxes. It’s freaking complicated, and you don’t need to understand it for interviews. Really. I’m writing this answer mostly to entertain the [algorithm] tag regulars.

Variables

n is the number of elements currently in the structure, and p is the number of pushes in the lifetime of the structure. Clearly n ≤ p, and in general, log p is not O(log n).

Tournament trees

The main building block is the tournament tree. A tournament tree is a full binary tree (every node has zero or two children) with labeled nodes such that each node with two children labeled x and y is labeled max(x, y). Semantically, the contents of this data structure are the labels of nodes with zero children (leaves). If you’re confused, look at a complete bracket for a single-elimination tournament.

The useful thing about tournament trees is that we can order the leaves any which way we want. For this problem, we want queue order. The root element gives the overall max label. To find the leftmost leaf with that label, repeatedly descend to the left child if it’s labeled the same as the current node, else the right node. To soft-delete a leaf, set its value to −∞ and update its ancestors from parent to root.

Amortized O(1) pushes and worst-case O(log p) pop-maxes

There are better ways to accomplish this in practice, but our goal here is to demonstrate ideas.

We keep a linked list of O(log p) tournament trees. Concatenated, their leaves represent the queue. Each tree is a complete binary tree with 2k leaves (soft-deleted elements are included in the count) for some integer k ≥ 0.

The push operation resembles adding one to a number in binary representation. We put the new element in a tournament tree by itself and append that tree to the list. While the last two trees in the list have the same size, combine them into a single tree by making the second to last the left child and the last the right child of a new tree.

The pop-max operation scans the tree roots to find the overall max, then soft-deletes the leftmost occurrence.

Worst-case O(1) pushes

We can be lazier about merging trees. Instead of finishing the merge loop immediately, we keep a queue of continuations. Each continuation can be represented as a mutable pointer to a tree in the list. To step it, we compare the size of the tree to the size of its left neighbor; if they’re the same, then merge the trees and update the pointer to the merged tree. Otherwise, the continuation is done.

The push operation appends a singleton tree, appends a continuation pointing to that tree to the back of the queue, and then continues the work at the front for a couple steps. At any given time, there will be O(log p) merges to be continued, so pop-max still runs fast enough. (This follows from the amortized analysis.)

Regular pops

We can implement the pop operation in time worst-case time O(log p) by adding a doubly linked list structure to the tournament tree leaves not yet deleted. The tournaments use soft deletion; this list uses hard deletion.

Obviously we want pops to run in constant time. We can get constant amortized time by splitting the leftmost tournament tree until it has one element before soft-deleting (with some sort of barrier to ensure that the merging continuations from before leave this prefix alone).

Worst-case constant time should be possible with more scheduling like we did for push.

Worst-case O(log n) pop-maxes

Never mind hand-waving, at this point it’s basically my whole arms. Our strategy is to limit the effective value of p to O(n) by periodically rebuilding the whole structure in the background. This means issuing pop operations to the rebuild and remembering how far we are in the rebuild so that we can issue pop-maxes if needed. Assuming that we do multiple pushes on the rebuild with every operation, we’ll finish before pops and pop-maxes can decrease the element count by more than a constant fraction.

Open questions

I’m sure that there’s a cleaner way to accomplish all this. What is it?

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • Thank you very much for the full write-up; the only part that's unclear is whether regular pops can be done in even amortized constant time. I assumed that the trees decrease in size from left to right. Wouldn't repeatedly splitting the leftmost tournament tree mess up that ordering? Also, this splitting can create up to (log p) new trees, if done on the largest tree. If the amortized cost's potential is based on number of trees, doesn't that cause a problem? – kcsquared Oct 20 '21 at 02:25
  • 1
    @kcsquared We have to amortize on pushes, not trees. The tree sizes become bitonic (increasing then decreasing). Really we should keep two lists so that the continuations know where to stop. – David Eisenstat Oct 20 '21 at 02:28
  • 1
    @kcsquared If you want to drink from the firehose, Kaplan--Tarjan ("Purely Functional, Real-Time Deques with Catenation") is what these arguments look like in published detail. – David Eisenstat Oct 20 '21 at 02:29
  • @kcsquared https://en.wikipedia.org/wiki/Skew_binary_number_system is also relevant. – David Eisenstat Oct 20 '21 at 02:30
  • 1
    This is a lot more fun than I expected from this question. You don't need tournament trees if you're going to do the incremental rebuild. Any of the O(1) functional queues based on recursive slowdown (IIRC that's the right term) will let you do the pop_max with soft delete in O(log n) time if you annotate each node with its max child. – Matt Timmermans Oct 20 '21 at 03:02
  • @MattTimmermans you got me thinking. I wonder if we can implement a restricted data structure where all of the pushes precede all of the pops (but not pop-maxes, for which we have time to soft delete), then plug it into the standard real-time functional queue. It's past my bedtime though. – David Eisenstat Oct 20 '21 at 03:20
  • 1
    This is excellent-- I'll have to think more about the tree-splitting behavior on the left end after pop(), since that seems crucial. The most relevant paper I've found was this on [amortized cost of find-min](https://epubs.siam.org/doi/abs/10.1137/1.9781611973730.51), which shows that if insert and *arbitrary* deletion can be done in constant time then even find-min (or find-max) can't be done in less than linear time. Tarjan's paper you linked mentions extending their ideas for find-min only. I believe that this post is plausibly the first publication anywhere of this data structure. – kcsquared Oct 20 '21 at 03:37
  • There's a challenge with the soft-deleted elements left over from the pop-max() operations-- what if you have a large amount (say, log log n) of consecutive soft-deleted elements when the pop()'s reach them? It might be enough to do something like 'detect any completely empty trees and delete them', but it feels like there's a clever idea still needed to prevent the soft-deletions from messing up the pop() complexity or the rebuilding process. – kcsquared Oct 20 '21 at 05:34
  • 1
    @kcsquared To tie up one loose end: O(log n) push, O(1) pop and pop-max can be handled by cross-referencing a queue and a [Levcopoulos and Overmars tree](https://link.springer.com/article/10.1007/BF00299635). – David Eisenstat Oct 20 '21 at 12:58
1

First let's argue for a target running time. We can use this abstract data type to sort an n-element list with n pushes followed by n pop-maxes. Assuming generic comparable elements, since the fastest possible comparison sort is Θ(n log n), the worst-case push/pop-max pair must be Ω(log n).

One way to get an O(log n) worst case for all three operations is implemented in C++ below. With amortized accounting, we can make pushes O(log n) and pops and pop-maxes free.

This does leave the question of whether we can get worst-case O(1) pushes, O(1) pops, and O(log n) pop-maxes. I'm confident that the answer is yes, but the solution that I have in mind is rather complicated, involving scheduled maintenance of O(log n) tournament trees on segments of the queue whose sizes decrease geometrically.

#include <list>
#include <map>

template <typename T> class QueueWithPopMax {
public:
  void Push(T element) {
    typename std::list<ListElement>::iterator back =
        list_.insert(list_.end(), ListElement{});
    back->iterator = multimap_.insert({element, back});
  }

  T Pop() {
    T element = list_.front().iterator->first;
    multimap_.erase(list_.front().iterator);
    list_.pop_front();
    return element;
  }

  T PopMax() {
    T element = multimap_.begin()->first;
    list_.erase(multimap_.begin()->second);
    multimap_.erase(multimap_.begin());
    return element;
  }

private:
  struct ListElement {
    typename std::multimap<T, typename std::list<ListElement>::iterator,
                           std::greater<T>>::iterator iterator;
  };
  std::multimap<T, typename std::list<ListElement>::iterator, std::greater<T>>
      multimap_;
  std::list<ListElement> list_;
};

#include <iostream>

int main() {
  QueueWithPopMax<int> queue;
  queue.Push(2);
  queue.Push(3);
  queue.Push(4);
  queue.Push(5);
  queue.Push(1);
  queue.Push(5);
  std::cout << queue.PopMax() << "\n";
  std::cout << queue.Pop() << "\n";
  std::cout << queue.Pop() << "\n";
  std::cout << queue.Pop() << "\n";
  std::cout << queue.Pop() << "\n";
  std::cout << queue.Pop() << "\n";
}
David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • Are you confident there's a solution for 'worst-case O(1) pushes, O(1) pops, and O(log n) pop-maxes'? It seems like a very, very difficult problem, and interesting enough to deserve its own post (although there appear to be several old, unanswered questions here about exactly that). But perhaps cs.stackexchange would be a better site for the discussion – kcsquared Oct 19 '21 at 23:41
  • 1
    @kcsquared Yes, I posted another answer with my reasoning. – David Eisenstat Oct 20 '21 at 02:01