You can use an adaptation of the streaming median algorithm to track the k
smallest terms of a set of terms. You can use std::priority_queue
for your min and max heaps.
The algorithm would work like this:
- The max heap is used to hold the
k
smallest terms
- The min heap is used to hold all the other terms
- For every term to be tracked, decide which heap it should be added to, and add it there
- If the size of the max heap is smaller than
k
add it there, else
- if the term is smaller than the top of the max heap, add it there, else
- add the term to the min heap
- If the top of the max heap has more than
k
terms, pop off the top term, and push it into the min heap
If you need your terms sorted, you can pop them off the max heap in descending order, placing them in a array in reverse order, leaving you with a sorted array. If you passed in the container to the max heap's constructor, you can copy the container, and sort it.
The std::priority_queue
is a max heap by default. To make it a min heap, you modify some of the template parameters.
typedef std::priority_queue<int> MaxHeap;
typedef std::priority_queue
<
int,
std::priority_queue<int>::container_type,
std::greater<int>
> MinHeap;