1

Can anyone suggest any methods or link to implementations of fast median finding for dynamic ranges in c++? For example, suppose that for iterations in my program the range grows, and I want to find the median at each run.

Range
4
3,4
8,3,4
2,8,3,4
7,2,8,3,4

So the above code would ultimately produce 5 median values for each line.

Bob John
  • 3,688
  • 14
  • 43
  • 57
  • 2
    Can you show your loop for producing the above so we can add to it? – CR41G14 Mar 26 '13 at 16:53
  • 1
    Is linear time update cost for an insertion / deletion "fast"? Without keeping track of a sorted list this is the fastest possible solution I can come up with... – leemes Mar 26 '13 at 16:55
  • Perhaps manage a balanced tree of values, rebalancing as you insert values, and take the root node to get the median. I think insertion is O(log n) and lookup of the root node pointer is constant. – Alex Reynolds Mar 26 '13 at 16:59
  • Potentially useful previous thread on finding kth smallest element in BST http://stackoverflow.com/questions/2329171/find-kth-smallest-element-in-a-binary-search-tree-in-optimum-way/2329236#2329236 – Shafik Yaghmour Mar 26 '13 at 17:00

3 Answers3

4

The best you can get without also keeping track of a sorted copy of your array is re-using the old median and updating this with a linear-time search of the next-biggest value. This might sound simple, however, there is a problem we have to solve.

Consider the following list (sorted for easier understanding, but you keep them in an arbitrary order):

    1, 2, 3, 3, 3, 4, 5
//           *

So here, the median is 3 (the middle element since the list is sorted). Now if you add a number which is greater than the median, this potentially "moves" the median to the right by one half index. I see two problems: How can we advance by one half index? (Per definition, the median is the mean value of the next two values.) And how do we know at which 3 the median was, when we only know the median was 3?

This can be solved by storing not only the current median but also the position of the median within the numbers of same value, here it has an "index offset" of 1, since it's the second 3. Adding a number greater than or equal to 3 to the list changes the index offset to 1.5. Adding a number less than 3 changes it to 0.5.

When this number becomes less than zero, the median changes. It also have to change if it goes beyond the count of equal numbers (minus 1), in this case 2, meaning the new median is more than the last equal number. In both cases, you have to search for the next smaller / next greater number and update the median value. To always know what the upper limit for the index offset is (in this case 2), you also have to keep track of the count of equal numbers.

This should give you a rough idea of how to implement median updating in linear time.

leemes
  • 44,967
  • 21
  • 135
  • 183
  • Why the downvote? It's your good right, but please comment why. – leemes Mar 27 '13 at 14:44
  • If you are OK with linear time, you might as well just keep the array sorted. Inserting a single value into a sorted array also takes linear time, and would be much simpler than the algorithm above. (But the downvote - if any - is not from me.) – Igor ostrovsky Mar 28 '13 at 00:17
  • @Igorostrovsky You're right. I was so fixed on finding a solution without sorting, so I didn't see that this is a bad idea. Inserting in a sorted array is possible in logarithmic time, by the way (binary search). – leemes Mar 28 '13 at 20:13
  • Unfortunately, you can't insert in logarithmic time because you need to shift all elements. The full answer IMHO should say two things: 1. if you need linear time, just do one insertion sort step on each new value. 2. if you need logarithmic time, you'll need a data structure such as red-black tree, AVL tree, skip list, etc (any one of those would do). – Igor ostrovsky Mar 28 '13 at 23:37
0

I think you can use a min-max-median heap. Each time when the array is updated, you just need log(n) time to find the new median value. For a min-max-median heap, the root is the median value, the left tree is a min-max heap, while the right side is a max-min heap. Please refer the paper "Min-Max Heaps and Generailized Priority Queues" for the details.

-1

Fins some code below, I have reworked this stack to give your necessary output

    private void button1_Click(object sender, EventArgs e)
    {
        string range = "7,2,8,3,4";
        decimal median = FindMedian(range);
        MessageBox.Show(median.ToString());

    }

    public decimal FindMedian(string source)
    {
        // Create a copy of the input, and sort the copy

        int[] temp = source.Split(',').Select(m=> Convert.ToInt32(m)).ToArray();
        Array.Sort(temp);

        int count = temp.Length;
        if (count == 0) {
            throw new InvalidOperationException("Empty collection");
        }
        else if (count % 2 == 0) {
            // count is even, average two middle elements
            int a = temp[count / 2 - 1];
            int b = temp[count / 2];
            return (a + b) / 2m;
        }
        else {
            // count is odd, return the middle element
            return temp[count / 2];
        }
    }
Community
  • 1
  • 1
CR41G14
  • 5,464
  • 5
  • 43
  • 64
  • Seems like a hater is underway. My answer as well as the question got downvoted too, without leaving a comment. – leemes Mar 27 '13 at 14:45