4

Following the answer found here, https://stackoverflow.com/a/10931091/1311773, I am attempting to implement two heaps so I can calculate a running median.

I'm unfamiliar with heaps, and am not sure where to begin implementing this function described here. http://programmingpraxis.com/2012/05/29/streaming-median/

My goal is to create a small test program that calculates running medians efficiently, so as the list grows the median doesn't need to be recalculated from scratch. Using two heaps, I should be able to do it, I'm just shaky on how to start implementing it.

Any advice on this would be appreciated.

Community
  • 1
  • 1
Edge
  • 2,456
  • 6
  • 32
  • 57

2 Answers2

4

The std::priority_queue template provides all the properties of a heap. Constant time maximum or minimum extraction (depending on how the items are compared), and logarithmic time insertion. It can be found in the <queue> header file.

By default, the priority_queue is a max-heap.

int numbers[11] = { 0, 9, 3, 4, 8, 12, 2, 11, 10, 1, 5 };
std::priority_queue<int> myheap(numbers, numbers + 11);
std::cout << "biggest " << myheap.top() << std::endl; // 12
myheap.pop();
std::cout << "biggest " << myheap.top() << std::endl; // 11
myheap.push(6);
std::cout << "biggest " << myheap.top() << std::endl; // 11
myheap.push(13);
std::cout << "biggest " << myheap.top() << std::endl; // 13

Here is an example of how to create a min-heap:

std::priority_queue<
    int,
    std::priority_queue<int>::container_type,
    std::greater<int>
>;

To implement the streaming median algorithm, the approach is similar to this:

  • create a max-heap for items that are smaller than the median
  • create a min-heap for items that are greater than the median
  • when pushing new items into the heap
    • decide which heap it should be pushed into, and push it there
    • if one of the heaps' size is more than 1 greater than the other heap, then pop the bigger one, and put that element into the smaller one

Then, the median is the either the top of the larger heap, or the average of the tops of both heaps.

If you feel you need to manage the heap manually, C++ provides algorithms that allow you to do so on your own random access data structure.

  • std::make_heap -- heapify a region specified by iterator endpoints
  • std::push_heap -- assumes the first N-1 elements are already a heap, and incoporates the N'th element into the heap
  • std::pop_heap -- places the first element in the region into the last position, and reheapifies the region, but leaving the last element alone
jxh
  • 69,070
  • 8
  • 110
  • 193
  • I can't find any resources on code to implement it. I don't have a great understanding of templates, or where to go with the priority_queue. I understand it comes with a lot of inbuilt functions. – Edge Jun 10 '12 at 04:42
  • @Andrew: I provided some examples, I hope it helps. – jxh Jun 10 '12 at 04:56
  • What are the speeds and costs of this algorithm, as well as memory space? Is using the heap method for calculating running medians faster then just using something like Quickselect over and over? – Edge Jun 10 '12 at 05:04
  • @Andrew: Wikipedia has a good article about the heap algorithm. – jxh Jun 10 '12 at 05:35
0

I guess this will help. Thanks.

    #include<cstdio>
    #include<iostream>
    #include<queue>
    #include <vector>         
    #include <functional>

    typedef priority_queue<unsigned int> type_H_low;
    typedef priority_queue<unsigned int, std::vector<unsigned int>, std::greater<unsigned int> > type_H_high;

    size_t signum(int left, int right) {
        if (left == right){
            return 0;
        }
        return (left < right)?-1:1;
    }

    void get_median( unsigned int x_i, unsigned int &m, type_H_low *l, type_H_high *r) {

        switch (signum( l->size(), r->size() )) {
        case 1: // There are more elements in left (max) heap
            if (x_i < m) {
                r->push(l->top());
                l->pop();
                l->push(x_i);
            } else {
                r->push(x_i);
            }
            break;

        case 0: // The left and right heaps contain same number of elements
            if (x_i < m){
                l->push(x_i);
            } else {
                r->push(x_i);
            }
            break;

        case -1: // There are more elements in right (min) heap
            if (x_i < m){
                l->push(x_i);
            } else {
                l->push(r->top());
                r->pop();
                r->push(x_i);
            }
            break;
        }

        if (l->size() == r->size()){
            m = l->top();
        } else if (l->size() > r->size()){
            m = l->top();
        } else {
            m = r->top();
        }

        return;
    }

    void print_median(vector<unsigned int> v) {
        unsigned int median = 0;
        long int sum = 0;
        type_H_low  H_low;
        type_H_high H_high;

        for (vector<unsigned int>::iterator x_i = v.begin(); x_i != v.end(); x_i++) {
            get_median(*x_i, median, &H_low, &H_high);
            std::cout << median << std::endl;
        }
    }
  • You've already posted this [exact same answer](https://stackoverflow.com/a/44813794/3982001) to another question. I'd recommend to keep only one of them (probably the other one, as it is a much more popular question), but if you decide to keep this one please [edit] it to fix the indentation and add some commentary, as I wrote there. Thank you! – Fabio says Reinstate Monica Jun 28 '17 at 23:29