Is there an efficient, time saving method of maintaining a heap in which elements are removed in the middle?

Question

I'm working on a path planning program where I have a priority queue 'U':

using HeapKey = pair<float, float>;
vector<pair<HeapKey, unsigned int>> U;

I order and maintain my priority queue as a binary min-heap (aka. the cheapest node first in the queue) using greater as my comparison function to get a min-heap (maybe not important). While the program is executing and planning a path it is adding nodes to 'U' with push_back() followed by push_heap() to get that node into the correct order and everything is working fine there...

However, the algorithm I'm using calls for sometimes updating a node already present in 'U' with new values. It does this by removing it from 'U' (I find it with find_if() and remove it with erase(), if that's important) and then call the function to re-insert (again push_back() followed by push_heap()) so the node have its updated values.

This have proved a bit of an unexpected problem for me. I'm no expert at this, but as far as I've been able to think, since the node is removed some place INSIDE 'U' then it messes up the order of the heap. I've been able to get the program to work by using make_heap() after the node is removed. However, this solution brought another issue since the program now takes a lot more time to complete, longer the larger my map/nodes in the heap, presumably because make_heap() is re-organizing/iterating through the entire heap every time I update a node, thus slowing down the overall planning.

Sadly I don't have time to change my program and get new results, I can only make use of simple, easy solutions I can implement fast. I'm mostly here to learn and perhaps see if there are some suggestions I can pass on about how to solve this issue of efficiently maintaining a heap/priority queue when you aren't just removing the first or last elements but elements maybe in the middle. Reducing the time taken to plan is the only thing I am missing.

Attempt at minimal reproducible example without going into the actual algorithm and such:

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

using Cost = float;
using HeapKey = pair<Cost, Cost>;
pair<Cost, Cost> PAIR1;
vector<pair<HeapKey, unsigned int>> U;
using KeyCompare = std::greater<std::pair<HeapKey, unsigned int>>;
int in_U[20];

ostream& operator<<(ostream& os, pair<Cost, Cost> const& p) {
  return os << "<" << p.first << ", " << p.second << ">";
}

bool has_neightbor(unsigned int id) {
    if ( (in_U[id+1]) && (in_U[id-1])) {
        return true;
    }
    return false;
}

void insert(unsigned int id, HeapKey k) {
    U.push_back({ k, id });
    push_heap(U.begin(), U.end(), KeyCompare());
    in_U[id]++;
}

void update(unsigned int id) {
    Cost x;
    Cost y;
    if (id != 21) { //lets say 21 is the goal
        x = U[id].first.first;
        y = U[id].first.second;
    }
    if (in_U[id]) {
        auto it = find_if(U.begin(), U.end(), [=](auto p) { return p.second == id; });
        U.erase(it);
        make_heap(U.begin(), U.end(), KeyCompare());
        in_U[id]--;
    }
    int r1 = rand() % 10 + 1;
    int r2 = rand() % 10 + 1;
    if (x != y) {
      insert(id, {x + r1, y + r2});
    }
}
 
int main() {

    U.push_back({ {8, 2}, 1 });
    in_U[1]++;
    U.push_back({ {5, 1}, 2 });
    in_U[2]++;
    U.push_back({ {6, 1}, 3 });
    in_U[3]++;
    U.push_back({ {6, 5}, 4 });
    in_U[4]++;
    U.push_back({ {2, 3}, 5 });
    in_U[5]++;
    U.push_back({ {2, 9}, 6 });
    in_U[6]++;
    U.push_back({ {9, 2}, 7 });
    in_U[7]++;
    U.push_back({ {4, 7}, 8 });
    in_U[8]++;
    U.push_back({ {11, 4}, 9 });
    in_U[9]++;
    U.push_back({ {2, 2}, 10 });
    in_U[10]++;
    U.push_back({ {1, 2}, 11 });
    in_U[11]++;
    U.push_back({ {7, 2}, 12 });
    in_U[12]++;
    make_heap(U.begin(), U.end(), KeyCompare());
    PAIR1.first = 14;
    PAIR1.second = 6;

    while (U.front().first < PAIR1) {

        cout << "Is_heap?: " << is_heap(U.begin(), U.end(), KeyCompare()) << endl;

        cout << "U: ";
        for (auto p : U) {
            cout << p.second << p.first << " - ";
        }
        cout << endl;

        auto uid = U.front().second;
        pop_heap(U.begin(), U.end(), KeyCompare());
        U.pop_back();

        if (has_neightbor(uid)) {
            update(uid - 1);
            update(uid + 1);
        }
    }
    //getchar();
}

It's hard to get a good overview by just reading about what the code looks like. A [mre] would do a lot to increase the understanding for your current solution — Ted Lyngmo, Feb 13 '23 at 15:49
Unless the node is at the top of the heap, just mark it as "unusable". Then after some certain period of time or threshold, clean up the heap of all the unusable items. — PaulMcKenzie, Feb 13 '23 at 16:00
Something like `if (numNodesInHeap > threshold) { remove_unusable_nodes_from_heap(); }`. Of course there is additional logic required, and maybe changing the `HeapKey` type to add an additional "in-use" boolean flag. But further analysis would require an [mcve]. — PaulMcKenzie, Feb 13 '23 at 16:15
Wikipedia outlines functions for [deleting or updating](https://en.wikipedia.org/wiki/Binary_heap#Delete) elements within a heap. However, if you have to do a linear search to find the node first, you could switch to a [`boost::bimap`](https://www.boost.org/doc/libs/1_81_0/libs/bimap/doc/html/index.html) instead. That way you can look up individual nodes on O(log n) — Homer512, Feb 13 '23 at 17:06
Please realise that `find_if` will be your second bottleneck: it is an operation that has an average time complexity of O(n), while removal is possible with O(logn) time complexity if you maintain a hashtable to know where each value is located in the heap. — trincot, Feb 13 '23 at 19:56

Mooing Duck · Answer 1 · 2023-02-16T15:23:03.267

Yes, the algorithm is relatively simple. Note that when considering an item at index i, it's "parent" in a heap is at index (i-1)/2, and it's children are at indecies i*2+1 and i*2+2.

Swap item_to_pop for the last item in the range. This moves that item to the desired (last) position, but inserts a "small" item in the middle of the heap. This needs to be fixed.
If the "small" item at item_to_pop position is larger than it's current parent, then swap with it's parent. Repeat until that item is either no longer larger than it's current parent or is the new root. Then we're done. Notably, this is the same algorithm as push_heap, except with the shortcut that we start in the middle instead of at the end.
If the "small" item at item_to_pop position is smaller than either current child, then swap with the larger child. Repeat until that item is larger than any of its current children (note that near the end it might only have one or no children). Then we're done. Notably, this is the same algorithm as pop_heap, except with the shortcut that we start in the middle instead of at the top.

This algorithm will do at most log2(n)+1 swaps, and log2(n)*2+1 comparisons, making it almost as fast as pop_heap and push_heap. Which isn't really surprising since it's the same algorithm.


template< class RandomIt, class Compare >
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop, Compare comp) {
    assert(std::is_heap(first, last)); //this is compiled out of release builds
    assert(first<=item_to_pop);
    assert(item_to_pop<last);
    using std::swap;
    std::size_t new_size = last - first - 1;
    if (new_size == 0) return;
    //swap the end of the range and item_to_pop, so that item_to_pop is at the end
    swap(*item_to_pop, *--last);
    if (new_size == 1) return;
    //If item_to_pop is bigger than it's parent, then swap with the parent
    bool moved_up = false;
    RandomIt swap_itr;
    while (true) {
        std::size_t offset = item_to_pop - first;
        if (offset == 0) break; //item_to_pop is at root: exit loop 
        swap_itr = first + (offset-1) / 2;
        if (comp(*item_to_pop, *swap_itr)) 
            break; //item_to_pop smaller than it's parent: exit loop
        swap(*item_to_pop, *swap_itr); //swap with parent and repeat
        item_to_pop = swap_itr;
        moved_up = true;        
    }
    if (moved_up) return; //if we moved the item up, then heap is complete: exit
    //If biggest child is bigger than item_to_pop, then swap with that child
    while (true) {
        std::size_t offset = item_to_pop - first;
        std::size_t swap_idx = offset * 2 + 1;
        if (swap_idx >= new_size) break; //no children: exit loop
        swap_itr = first + swap_idx;
        if (swap_idx+1 < new_size && comp(*swap_itr, *(swap_itr+1))) //if right child exists and is bigger, swap that instead
            ++swap_itr;
        if (!comp(item_to_pop, swap_itr)) break; //item_to_pop bigger than biggest child: exit loop
        swap(*item_to_pop, *swap_itr); //swap with bigger child and repeat
        item_to_pop = swap_itr;
    }
}
template< class RandomIt >
constexpr void pop_mid_heap(RandomIt first, RandomIt last, RandomIt item_to_pop) {
    pop_mid_heap(first, last, item_to_pop, std::less<>{});
}

https://ideone.com/zNW7h7

Theoretically one can optimize out the "or is the new root" check in the push_heap part, but the checks to detect that case adds complexity that doesn't seem worth it.

IMO, this is useful and should be part of the C++ standard library.

Ohhh, looks promising. Will probably need to go over it a few times to fully understand, but wanted to say thank you in a timely manner. So thank you ^^ — OverDemon, Feb 13 '23 at 17:59
If the children are at `2*i+1` and `2*i+2`, then the parent is at `(i-1)/2`, not `i/2`. — Jim Mischel, Feb 16 '23 at 05:32

score 0 · Answer 2 · answered Feb 16 '23 at 18:50

In general it's expensive to update a node in the middle of a binary heap not because the update operation is expensive but because finding the node is an O(n) operation. If you know where the node is in the heap, updating its priority is very easy. My answer at https://stackoverflow.com/a/8706363/56778 shows how to delete a node. Updating a node's priority is similar: rather than replacing the node with the last one in the heap, you just sift the node up or down as required.

If you want the ability to find a node quickly, then you have to build an indexed heap. Basically, you have a dictionary entry for each node. The dictionary key is the node's ID (or whatever you use to identify it), and the value is the node's index in the binary heap. You modify the heap code so that it updates the dictionary entry whenever the node is moved around in the heap. It makes the heap a little bit slower (by a constant factor), but makes finding an arbitrary node an O(1) operation.

Or, you can replace the binary heap with a Pairing Heap, skip list, or any of the other "heap" types that work with node pointers. My experience has been that although the theoretical performance of those two isn't as good as the theoretical performance of Fibonacci heap, the real-world performance is much better.

With either of those it's a whole lot easier to maintain an index: you just add a node reference to it when you add a node to the heap, and remove a reference when the node is removed from the heap. Both of those heap types are easy to build and performance will be about the same as for a binary heap although they will use somewhat more memory. From experience I'll say that Pairing heap is easier to build than skip list, but skip list is a more generally useful data structure.

Is there an efficient, time saving method of maintaining a heap in which elements are removed in the middle?

2 Answers2