22

I ran into the following very difficult interview question:

Consider Queue Data Structure with three operations:

- Add into the front of list (be careful front of list)

- Delete from Tail of the list (end of the list)

- Extract Min (remove)

The best implementation of this data structure has amortized time:

A) three operation at O(1)

B) three operation at O(log n)

C) add and delete in O(1) and Extract-min in O(log n)

D) add and delete in O(log n) and Extract-min in O(n)

After the interview I saw that (C) is the correct answer. Why is this the case?

The first challenge is comparing the options: which option is better than the others and how we can understand the final correct option?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • @TomerShetah how you can sure there exist data structure with option (C)? and simple logic. –  Jan 20 '21 at 17:59
  • Well, you would have a pointer to the head and another one to the tail of the list. When you want to add or delete you would have complexity O(1), because you already have the pointer to the head/tail, so you would not have to do any operations (at least none that would have complexity higher than O(1)). When extracting you would make a copy of the list and order it using a quick sort, which has the complexity O(log n), thus giving you the correct answer, letter C) – Pedro Zanutto Jan 25 '21 at 11:25
  • Since this is about amortized time then I *think* that a data structure that consists of a tree for ordered elements (needs some pointers to keep the initial order) and an array for newly added/removed elements, and the elements from the array are moved to the tree on each call to extract-min (in other words update the tree), should satisfy the answer C. – Sopel Jan 31 '21 at 10:28

5 Answers5

25

Of the given running times, A is faster than C is faster than B is faster than D.

A is impossible in a comparison-based data structure (the unstated norm here) because it would violate the known Ω(n log n)-time lower bound for comparison sorts by allowing a linear-time sorting algorithm that inserts n elements and then extracts the min n times.

C can be accomplished using an augmented finger tree. Finger trees support queue-like push and pop in amortized constant time, and it's possible to augment each node with the minimum in its sub-tree. To extract the min, we use the augmentations to find the minimum value in the tree, which will be at depth O(log n). Then we extract this minimum by issuing two splits and an append, all of which run in amortized time O(log n).

Another possibility is to represent the sequence as a splay tree whose nodes are augmented by subtree min. Push and pop are O(1) amortized by the dynamic finger theorem.

Fibonacci heaps do not accomplish the same time bound without further examination because deletes cost Θ(log n) amortized regardless of whether the deleted element is the min.

Since C is feasible, there is no need to consider B or D.


Given the limitations on the data structure, we actually don't need the full power of finger trees. The C++ below works by maintaining a list of winner trees, where each tree has size a power of two (ignoring deletion, which we can implement as soft delete without blowing up the amortized running time). The sizes of the trees increase and then decrease, and there are O(log n) of them. This gives the flavor of finger trees with a much lesser implementation headache.

To push on the left, we make a size-1 tree and then merge it until the invariant is restored. The time required is O(1) amortized by the same logic as increasing a binary number by one.

To pop on the right, we split the rightmost winner tree until we find a single element. This may take a while, but we can charge it all to the corresponding push operations.

To extract the max (changed from min for convenience because nullopt is minus infinity, not plus infinity), find the winner tree containing the max (O(log n) since there are O(log n) trees) and then soft delete the max from that winner tree (O(log n) because that's the height of that tree).

#include <stdio.h>
#include <stdlib.h>

#include <list>
#include <optional>

class Node {
public:
  using List = std::list<Node *>;
  virtual ~Node() = default;
  virtual int Rank() const = 0;
  virtual std::optional<int> Max() const = 0;
  virtual void RemoveMax() = 0;
  virtual std::optional<int> PopRight(List &nodes, List::iterator position) = 0;
};

class Leaf : public Node {
public:
  explicit Leaf(int value) : value_(value) {}
  int Rank() const override { return 0; }
  std::optional<int> Max() const override { return value_; }
  void RemoveMax() override { value_ = std::nullopt; }
  std::optional<int> PopRight(List &nodes, List::iterator position) override {
    nodes.erase(position);
    return value_;
  }

private:
  std::optional<int> value_;
};

class Branch : public Node {
public:
  Branch(Node *left, Node *right)
      : left_(left), right_(right),
        rank_(std::max(left->Rank(), right->Rank()) + 1) {
    UpdateMax();
  }

  int Rank() const override { return rank_; }

  std::optional<int> Max() const override { return max_; }

  void RemoveMax() override {
    if (left_->Max() == max_) {
      left_->RemoveMax();
    } else {
      right_->RemoveMax();
    }
    UpdateMax();
  }

  std::optional<int> PopRight(List &nodes, List::iterator position) override {
    nodes.insert(position, left_);
    auto right_position = nodes.insert(position, right_);
    nodes.erase(position);
    return right_->PopRight(nodes, right_position);
  }

private:
  void UpdateMax() { max_ = std::max(left_->Max(), right_->Max()); }

  Node *left_;
  Node *right_;
  int rank_;
  std::optional<int> max_;
};

class Queue {
public:
  void PushLeft(int value) {
    Node *first = new Leaf(value);
    while (!nodes_.empty() && first->Rank() == nodes_.front()->Rank()) {
      first = new Branch(first, nodes_.front());
      nodes_.pop_front();
    }
    nodes_.insert(nodes_.begin(), first);
  }

  std::optional<int> PopRight() {
    while (!nodes_.empty()) {
      auto last = --nodes_.end();
      if (auto value = (*last)->PopRight(nodes_, last)) {
        return value;
      }
    }
    return std::nullopt;
  }

  std::optional<int> ExtractMax() {
    std::optional<int> max = std::nullopt;
    for (Node *node : nodes_) {
      max = std::max(max, node->Max());
    }
    for (Node *node : nodes_) {
      if (node->Max() == max) {
        node->RemoveMax();
        break;
      }
    }
    return max;
  }

private:
  std::list<Node *> nodes_;
};

int main() {
  Queue queue;
  int choice;
  while (scanf("%d", &choice) == 1) {
    switch (choice) {
    case 1: {
      int value;
      if (scanf("%d", &value) != 1) {
        return EXIT_FAILURE;
      }
      queue.PushLeft(value);
      break;
    }
    case 2: {
      if (auto value = queue.PopRight()) {
        printf("%d\n", *value);
      } else {
        puts("null");
      }
      break;
    }
    case 3: {
      if (auto value = queue.ExtractMax()) {
        printf("%d\n", *value);
      } else {
        puts("null");
      }
      break;
    }
    }
  }
}
Athari
  • 33,702
  • 16
  • 105
  • 146
David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • 1
    one question arises for me. if you ran in this question and not known about finger tree, how choose between B to D? –  Jan 08 '21 at 14:01
  • 2
    @EmmaNic. B beats D and is achievable at least two ways using data structures and techniques that appear in CLRS. You could keep a doubly linked list and a heap with pointers between corresponding nodes in each and then cascade the deletion (list determines which element is the tail, cascading delete from heap; heap determines which element is the min, cascading delete from list). You could also augment a regular balanced binary tree (*not* a search tree). Honestly I think this question is way too hard for an interview, and if it were forced on me, I wouldn't downgrade anyone for picking B. – David Eisenstat Jan 08 '21 at 14:12
  • Thanks. also the question mentioned that choose the best option. would you please reference me to page or topic of your mentioned method in comment on CLRS? –  Jan 08 '21 at 14:25
  • @EmmaNic. Looks like CLRS Chapter 14 for augmented search trees. Technically to make the solution that I suggested into a search tree you'd have to store the index and use it as a key, but this isn't actually required under the hood. I don't have a copy of CLRS to hand so I'm not sure where it talks about multiple data structures in tandem. – David Eisenstat Jan 08 '21 at 14:32
  • @DavidEisenstat How do you cascade the delete from the list to the heap in the alternative solution? – Sebastian Redl Jan 08 '21 at 21:24
  • @SebastianRedl You have to keep track of which element of the list is where in the heap array, which makes the heap a bit more complicated. – David Eisenstat Jan 08 '21 at 23:04
  • nice details @SebastianRedl but up to yet I'm not sure about which DS can help us to reach this amortized bound. –  Jan 09 '21 at 00:15
  • I have one question. I think amortized analysis should be upper bounded of real cost ? FIFO Queue has O(1) insert and remove as an actual cost and here can also be amortized time as same as actual cost? –  Jan 18 '21 at 20:02
  • Combine a queue implemented as a doubly-linked list with a balanced heap structure for priorities. Adding or deleting an element to the queue is constant time with amortized migration in the heap that can be charged to the Extract-Min which is done in O(log n) time [plus constant time to delete from the doubly-linked queue list ]. are you agree @DavidEisenstat – user3661613 Jan 19 '21 at 09:51
  • just one point is not clear for me. if you can solve it this is termination on this bounty. how you sure that there exist data structure that can we construct to handle case (C)? for (a) we sure not, for (2) sure there exist, but is there any reason that w can guarantee this exist? –  Jan 19 '21 at 12:42
  • is there any idea @DavidEisenstat for my last comments? –  Jan 19 '21 at 18:19
  • @DavidEisenstat my solution works too? isnt it? – user3661613 Jan 19 '21 at 19:14
  • @user3661613 You can't charge insertion to extract-min if there is no extract-min (because the operation sequence ended or because we popped it off without extracting it). – David Eisenstat Jan 19 '21 at 19:17
  • 1
    very nice. would you please link me about the theorem? is it possible add a bit on implementation structure? not technical . just intuitive idea. how insert and remove done? a little example or any details that you prefer. it's too abstract. –  Jan 19 '21 at 19:44
  • @EmmaNic. On second thought, dynamic finger theorem doesn't do quite enough. 2-3 trees are still good though. CLRS Chapter 14 will give you a flavor of the tree augmentations. The rest has the flavor of balanced binary tree implementations; while the details are obviously different, the details really aren't that interesting. – David Eisenstat Jan 19 '21 at 20:59
  • one question in comment is interesting. how at first you sure that there exist such data structure? why you don't think there is nothing here? –  Jan 19 '21 at 21:15
  • @EmmaNic. I added a simpler implementation idea and some C++. – David Eisenstat Jan 19 '21 at 23:11
4

It sounds like they were probing you for knowing about priority queues implemented by way of fibonacci heaps.

Such implementations has the running times described in answer c.

1

O(1) time as only one operation has to be performed to locate it, so we can add at the start and delete from the end in a single operation.

O(log n) when we do divide and conquer type of algorithms like binary search, so as we have to extract the minimum instead of doing O(n) and increasing the time complexity we use the O(log n)

jedimasterbot
  • 237
  • 1
  • 8
  • The answer points to the correct direction: removing the min fast is possible with heaps or binary trees. Or linked lists + binary search. Also to add and remove from top and bottom of a list fast you need to know these plus the order of the elements. there is no single structure which can do this, but you can add extra pointers to a heap or tree node or have a doublelinked list and in parallel track a sorted structure. In any case, such a hybrid will not have the optimal minimized complexity – schnedan Jan 13 '21 at 21:30
1

You will start by thinking a min-heap for the extract-min operation. This will take O(log n) time, but so will the operations add and delete. You need to think can we have any of these two operations in constant time? Is there any data structure which can do so?

Closest answer is : Fibonacci-heap, used for implementing priority-queues (quite popularly used for implementing Djistra's Algorithm), which has the amortized run-time complexities of O(1) for insert, delete in O(log n) though (but since the operation is always delete from tail, we may be able to achieve this by maintaining a pointer to the last node and doing the decrease-key operation in O(1) time) and O(log n) for delete-min operations.

Internally, fibonacci-heap is a collection of trees, all satisfying the standard min-heap condition (value of parent always lower than its children), where the roots of all these trees are linked using a circular doubly linked list. This section best explains the implementations of each operation alongside its run-time complexity in further detail.

Have a look at this great answer which explains the intuition behind fibonacci-heaps.

Edit: As per your query regarding choosing between B to D, let's discuss them one by one.

(B) would be your first-at-glance answer since it clearly strikes as a min-heap. This eliminates (D) as well since it says extract-min in O(n) time but we clearly know we can do better. Thus leaving (C), with better options for add/delete operations, that is O(1). If you can think of combining multiple min-heaps (roots and children) with a circular doubly linked list, keeping track of a pointer to the root containing the minimum key in a data structure, i.e. a fibonacci-heap, you know that option (C) is possible and since its better than option (B), you have your answer.

Jarvis
  • 8,494
  • 3
  • 27
  • 58
  • How we can find (C) is the answer? –  Jan 08 '21 at 12:55
  • I added the link for explanation, have a look. I believe the article explains everything. – Jarvis Jan 08 '21 at 12:57
  • non of these answer not answered my challenge carefully. how you can choose the option (C)? why this is the best? –  Jan 08 '21 at 12:58
  • Have you checked the link I added for explanation? It clearly explains the log(N) time complexity for extract-min, merge, insert and delete operations in a fibonacci-heap. Which option do you think is best? Add that in the question as well. @EmmaNic. – Jarvis Jan 08 '21 at 12:59
  • please see my updated question. I think you misunderestand. I'm not need a proof about amortized operation time for fib heap. –  Jan 08 '21 at 13:00
  • I really don’t understand what you are asking here, which option do you think should be the answer? @EmmaNic. – Jarvis Jan 08 '21 at 13:03
  • A is wrong clear. how we can choose among (B) to (D)? –  Jan 08 '21 at 13:05
  • I added the explanation why to not choose B and D to the best of my ability, if you still don’t understand, not sure what you exactly want to know. @EmmaNic. – Jarvis Jan 08 '21 at 13:11
  • If I were be choosing between (B) to (D), obviously (B) is worse than both (C) and (D). Now between (C) and (D), I would expect the add and delete operations are more common than the extract-min operation. By that logic, it makes sense that the "best" implementation should have (C). But I agree that the question is vague. – Manish Dash Jan 08 '21 at 13:15
  • B is better than D, B has delete in O(log N) but D has delete in O(N). – Jarvis Jan 08 '21 at 13:16
  • @ManishDash exactly. why we couldn't do (A)? because lower bound on sorting comparison? –  Jan 08 '21 at 13:16
  • 1
    (A) is just not possible. It's like saying why isn’t every algorithm in this world doesn’t run in O(1). Because it’s just not possible. You cannot have all the operations in constant time without a cost any place else. – Jarvis Jan 08 '21 at 13:17
  • @Jarvis Oh yes. I am sorry. I missed that. To update my pref: D – Manish Dash Jan 08 '21 at 13:18
  • @Jarvis I think your answer is well but how we can sure that (B) is better than (D)? –  Jan 08 '21 at 13:19
  • Fibonacci heaps can't delete in O(1). – David Eisenstat Jan 08 '21 at 13:19
  • @DavidEisenstat so again we are going to main question. the time is given on amortized time. –  Jan 08 '21 at 13:22
  • @EmmaNic. The standard proof gives you Theta(log n) amortized (actual worst case can be linear IIRC). *Maybe* it's possible to prove O(1) when deleting from the tail, but this requires more proof. – David Eisenstat Jan 08 '21 at 13:24
  • @DavidEisenstat using two stacks A and B.and implement a FIFO queue we get an O(1) amortized cost for insert and delete. but here in this question add is on front and remove is done on end of the list. and not sure also about extract min in this case. –  Jan 08 '21 at 13:25
  • @EmmaNic. You can actually get O(1) worst case for FindMin without ExtractMin, but ExtractMin messes up the logic for all of these data structures. – David Eisenstat Jan 08 '21 at 13:26
  • @DavidEisenstat exactly. this question is very challenging and hard at least I think. –  Jan 08 '21 at 13:26
  • @DavidEisenstat maybe we can compare the options and choose the best one without thinking how data structure can be designed. please also see this related problem 1 on this link but I think its DEQUE http://webcache.googleusercontent.com/search?q=cache:rv6F9cI2TLcJ:web.stanford.edu/class/cs166/assignments/ps4_template.tex+&cd=2&hl=en&ct=clnk&gl=us&client=firefox-b-d –  Jan 08 '21 at 13:28
  • @EmmaNic. A is impossible in a comparison-based data structure because we could sort in linear time with it. B is straightforwardly possible with augmented binary trees, and B is better than D, so the question is B or C. I believe that the right answer is C, but if so I believe it's going to take something more exotic than a Fibonacci heap. – David Eisenstat Jan 08 '21 at 13:32
-2

Let's explore all the answers.

A is impossible because you can't find the Min in O(1) . Because obviously you should find it before removing it. And you need to do some operations to find it.

B is wrong also because we know that adding is O(1). The delete is also O(1) because we can directly access the last and the 1st elements.

By the same argument D is also wrong.

So we're left with C.

SX10
  • 30
  • 2