20

I'm using a priority queue as a scheduler with one extra requirement. I need to be able to cancel scheduled items. This equates to removing an item from the middle of the priority queue.

I can't use std::priority_queue as access to any element other than top is protected.

I'm trying to use the algorithm's heap functions. But I'm still missing the piece I need. When I remove an element I from the middle of the heap I want it to rebuild itself efficiently. C++ provides these heap functions:

  • std::make_heap O(3n)
  • std::push_heap O(lg(n))
  • std::pop_heap O(2 lg(n))

I want a new function like std::repair_heap with a big-O < 3n. I'd provide it with location of the hole where the canceled item used to reside and it would properly adjust the heap.

It seems to be a huge oversight to not to provide a std::repair_heap function. Am I missing something obvious?

Is there library that provides an stl-compliant std::repair_heap?

Is there a better data structure for modeling a scheduler?

NOTE:
I'm not using an std::map for a few reasons.

  • A heap has constant memory overhead.
  • A heap has awesome cache locality.
deft_code
  • 57,255
  • 29
  • 141
  • 224
  • Please correct me if I'm wrong, but I think you have to call `std::make_heap` for this, since you'll have to move elements around anyways. – Alexandre C. Jan 19 '11 at 17:28
  • 1
    I can just use `std::make_heap`. But it feels like there should be a faster alternative. I suspect `repair_heap` could be written to be _O(lg(n))_, like push and pop. My reasoning is `repair_heap` is just popping from the middle of the heap instead of the head. – deft_code Jan 19 '11 at 17:43
  • I wrote something like `repair_heap` for a similar scheduling problem (not in C++) and it can be done. I later hit the same problem as you with `std::priority_queue` and ultimately I decided that removal was so infrequent compared to insertion (possibly not true for you) that I just copied the heap while removing the element. My goal in that case was to create as little new/untested code as possible. – Ben Jackson Jan 19 '11 at 18:06
  • 4
    This is really nitpicky, but while it's technically correct to say O(3n) or O(2 lg n), this is usually not done because it misses the point of big-O. Big-O categorizes relative growth rates without considering constants. Instead of writing O(3n), just write O(n). Similarly, don't write O(2 lg n), write O(lg n). Now, if you really do want to say that your code runs with at most 2 lg n comparisons, then that's perfectly fine - just say that directly and don't use big-O notation. – templatetypedef Feb 03 '11 at 02:14
  • 3
    @templatetypedef, OP is actually just adopting [cplusplus.com's](http://www.cplusplus.com/reference/algorithm/make_heap/) way of explaining the time complexity of their algorithms in this instance, so we shouldn't fault OP for it. – Richard Jan 17 '14 at 19:50
  • 1
    @Richard Looking at the page you linked, it says "Up to linear in three times the distance between first and last." That's not the same as saying O(3n); the first of these is a very specific bound that's not specified asymptotically, while the second is technically correct but misleading. – templatetypedef Jan 17 '14 at 20:10
  • I don't consider @templatetypedef s correction of the use of Big-O to be in the least bit nit-picky. In fact, if you pay attention to it you might notice your overall problem could be O(n) in any case..... How would you find the element you want to remove in the first place? (Without special effort it would be O(n)) What is the cost of deleting? (Usually O(1)) Even with O(log(n)) repair: Find O(n), Delete O(1), Repair O(log(n)) ***is still O(n)*** _This is the same as Find, Delete, Rebuild O(n)._ – Disillusioned May 14 '16 at 14:14
  • Posting on old question, but still: Here is how to do it, in Java, but still useful: http://www.mathcs.emory.edu/~cheung/Courses/171/Syllabus/9-BinTree/heap-delete.html – Erik Alapää May 15 '19 at 11:42

6 Answers6

13

I guess you know which element in the heap container (index n) you want to delete.

  1. Set the value v[n] = BIG; the value BIG is really bigger than any other values in the heap.
  2. Call std::push_heap( v.begin(), v.begin()+n+1 );
  3. Call std::pop_heap( v.begin(), v.end() );
  4. Call v.pop_back();
  5. Done

Operation is O(ln(n))

RE: request for proof

First, a qualifier: This method assumes something about the algorithm used by std push_heap.
Specifically, it assumes that std push_heap( v.begin(), v.begin()+n+1 ) will only alter the range [0, n]
for those elements which are ascendants of n, i.e., indices in the following set:

A(n)={n,(n-1)/2,((n-1)/2-1)/2....0}.  

Here is a typical spec for std push_heap:

http://www.cplusplus.com/reference/algorithm/push_heap/ "Given a heap range [first,last-1), this function extends the range considered a heap to [first,last) by placing the value in (last-1) into its corresponding location in it."

Does it guarantee to use the "normal heap algorithm" that you read about in textbooks? You tell me.

Anyway, here is the code which you can run and see, empirically, that it works. I am using VC 2005.

#include <algorithm>
#include <vector>
#include <iostream>

bool is_heap_valid(const std::vector<int> &vin)
{
    std::vector<int> v = vin;
    std::make_heap(v.begin(), v.end());
    return std::equal(vin.begin(), vin.end(), v.begin());
}


int _tmain(int argc, _TCHAR* argv[])
{
    srand(0);
    std::vector<int> v;
    for (int i=0; i<100; i++)
    {
        v.push_back( rand() % 0x7fff );
    }
    std::make_heap(v.begin(), v.end());

    bool bfail = false;
    while( v.size() >= 2)
    {
        int n = v.size()/2;
        v[n] = 0x7fffffff;
        std::push_heap(v.begin(), v.begin()+n+1);
        std::pop_heap(v.begin(), v.end());
        v.resize(v.size()-1);
        if (!is_heap_valid(v))
        {
            std::cout << "heap is not valid" << std::endl;
            bfail = true;
            break;
        }
    }
    if (!bfail)
        std::cout << "success" << std::endl;

    return 0;

}

But I have another problem, which is how to know the index "n" which needs to be deleted. I cannot see how to keep track of that (know the place in the heap) while using std push_heap and std pop_heap. I think you need to write your own heap code and write the index in the heap to an object every time the object is moved in the heap. Sigh.

Craig Hicks
  • 131
  • 1
  • 3
  • I'm not convinced this will work. Convince me and I'll change the accepted answer to this one. – deft_code Aug 09 '11 at 03:08
  • I think you actually should be able to use std::push_heap and std::pop_heap if you overload std::swap for the type you're keeping in the heap. Make a member on your object for storing the index into the heap, and before inserting set it to the current size of the heap. This will be the right value at the beginning because the first heap insert step is to make the new element be the last one on the last level, so its index is the size of the heap before the insert. Then every time the heap functions call std::swap, have your override swap the index stored on each element as well. – Joseph Garvin Feb 15 '13 at 16:49
  • Unfortunately empirical evidence is insufficient. In fact, even a complete mathematical proof for a particular implementation is no good because another compiler or different version of the same compiler might use a different algorithm. The problem is: the only thing that makes a heap "a heap" is the properties maintained by the tree structure (_it doesn't even have to be a **binary** heap_). A heap implementation transforms a random access iterator into a _virtual tree_. So when you rearrange the tree over `begin..begin()+n+1` are heap properties still valid over `begin..end`? – Disillusioned May 14 '16 at 13:24
7

Unfortunately, the standard is missing this (fairly important) function. With g++, you can use the non-standard function std::__adjust_heap to do this, but there's no easy portable way of doing it -- and __adjust_heap is slightly different in different versions of g++, so you can't even do it portably over g++ versions.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • `std::__adjust_heap` looks like the right thing but there isn't any documentation. In particular I don't know what the `__value` parameter is for. – deft_code Jan 19 '11 at 18:31
  • http://stackoverflow.com/questions/228783/what-are-the-rules-about-using-an-underscore-in-a-c-identifier suggests that a function starting with "__" like `__adjust_heap` is for implementation only. – gregg Jan 19 '11 at 21:46
  • @gregg: `__adjust_heap` _is_ for implementation only. That makes it a bit risky to use as the function is undocumented, may change signature, or go away entirely. Worse case scenario is if the `__adjust_heap` changes semantics is a way that doesn't change its signature. eg it stops adjusting the heap and now randomizes it. So I'll have to be very careful while using it and write some awesome unit tests. – deft_code Jan 19 '11 at 21:58
  • I don't think the function is that important. Finding an item in the middle of the heap (so it can be deleted) is already O(n). So if this is something that needs to be done a lot: a "repair/adjust heap" function is going to be far less useful than a more appropriate algorithm/data structure. – Disillusioned May 15 '16 at 04:11
3

How does your repair_heap() work? Here's my guess:

If your heap is defined by some iterator range, say (heapBegin, heapEnd). The element you want to remove is the root of some subtree of the heap, which is defined by some subrange (subHeapBegin, subHeapEnd). Use std::pop_heap(subHeapBegin, subHeapEnd), then if subHeapEnd != heapEnd, swap the values at *(subHeapEnd-1) and *(heapEnd-1), which should put your deleted item at the end of the heap container. Now you have to percolate the element at *(subHeapEnd-1) up in your subheap. If I haven't missed something, which is possible, then all that remains is to chop the end element off of the heap container.

Before going to the trouble of trying to code that correctly (I've skipped some details like calculating subHeapBegin and subHeapEnd), I'd run some tests to determine if make_heap() really slows you down. Big-O is useful, but it's not the same thing as actual execution time.

gregg
  • 1,027
  • 5
  • 6
-1

It seems to me that removing from the middle of a heap might mean the entire heap has to be rebuilt: The reason there's no repair_heap is because it would have to do the same (big-oh) work as make_heap.

Are you able to do something like put std::pair<bool, Item> in the heap and just invalidate items instead of removing them? Then when they finally get to the top just ignore the item and move along.

Mark B
  • 95,107
  • 10
  • 109
  • 188
  • Ignoring scheduled items was my original strategy. However, one user of the scheduler overwhelmed it with canceled items. Less than a 100 actively schedule items but 1000s of canceled items. To solve this I'm trying to add true cancellation. – deft_code Jan 19 '11 at 17:39
  • 2
    @deft_code, how about keeping a count of cancelled items and only rebuild the heap when the count reaches a threshold? – Mark Ransom Jan 19 '11 at 17:48
  • 3
    removing an element from the middle of a heap is never more expensive that removing the top element (O(lg _n_)), so this should be possible, its just missing from the STL – Chris Dodd Jan 19 '11 at 18:08
  • 1
    @ChrisDodd While the act of removing an element from the middle should be O(log(n)), the catch is you cannot remove something without ***first finding*** the element to remove. Since the properties of a heap make no guarantees about which subtree will contain a particular element that is not the highest value, finding an arbitrary item is effectively O(n). Therefore find+ delete is also O(n). **Conclusion:** Mark B is right about the overall operation still being O(n), but for the wrong reason. _And 'repair heap' is probably not that useful; a different approach/structure may be better._ – Disillusioned May 15 '16 at 04:04
-1

You can try ‘std::multiset’ which is implemented as the heap structure and support ‘std::erase’ operation, so you could ‘std::find’ the element then erase it.

Hu Xixi
  • 1,799
  • 2
  • 21
  • 29
  • This answer is incorrect because std::multiset is a self-balancing binary tree, not a binary heap (the inorder traversal of the tree is sorted, as opposed to the tree having the heap invariant) – Andrew Wonnacott Aug 24 '21 at 03:43
-2

Here's a bit of delphi code i used to remove items from a heap. I don't know this C++ of which you speak and don't have a repair function, but hey..

first the pop, so you get an idea of how the thing works:

function THeap.Pop: HeapItem;
begin
  if fNextIndex > 1 then begin
    Dec(fNextIndex);
    Result:= fBuckets[1];   //no zero element
    fBuckets[1] := fBuckets[fNextIndex];
    fBuckets[fNextIndex] := nil;
    FixHeapDown;            //this has a param defaulting to 
    end
  else
    Result:= nil;
end;

now to contrast, the deletion:

procedure THeap.Delete(Item: HeapItem);
var
  i:integer;
begin
  for i:=1 to pred(fNextIndex) do
    if Item=fBuckets[i] then begin
      dec(fNextIndex);
      fBuckets[i] := fBuckets[fNextIndex];
      fBuckets[fNextIndex] := nil;
      FixHeapDown(i);
      break;
      end;
end;

its of course a no-no to even think about doing what we're doing here, but hey, costs do change sometimes and jobs do get canceled.

enjoy. i hope this helps.

deft_code
  • 57,255
  • 29
  • 141
  • 224