4

Given an array as input find the output array that has median of each sub array whose index starts from 0 to i(i = 1,2...array.length-1).

So basically given array A[], output array B[]. B[i] is the median of A[0] ... A[i].

I am thinking about using dynamic programming to store the two numbers before and after the median of each sub array. But it somehow gets complicated. Is there any easier solution?

user3692521
  • 2,563
  • 5
  • 27
  • 33
  • Do you have any time/space constraints? Otherwise, you are over-thinking the problem. Just sort the list first. – Nuclearman Mar 21 '15 at 20:25
  • 2
    You could store the stuff less than the median in a max heap and the stuff bigger than the median in a min heap. Move stuff from one heap to the other if the two aren't about the same size. Offhand, I can't think of an asymptotically faster way to do this than O(n log(n)). – tmyklebu Mar 21 '15 at 20:39
  • could you elaborate about the two-heap solution? – user3692521 Mar 21 '15 at 21:43

3 Answers3

4

Basically what we need here is a data structure that supports two operations: adding an arbitrary element and finding the median of all elements added to it so far.

  1. The most conceptually simple solution is a balanced binary search tree that stores sizes of all subtrees(add operation just adds an element to the tree and finding median is just one traversal from the root of the tree(we can choose where to go in each node because we know the sizes of all subtrees). But this one might be a little bit tedious to implement(binary search trees from standard library usually don't support the "get k-th element" operation efficiently).

  2. Here is another solution(it is O(N log N), too) which uses two heaps. It is easier implementation-wise because a priority queue from a standard library works fine.

    • Let's maintain two heaps: low(a max-heap) and high(a min-heap). The invariants are: any element of low are less than or equal to any element of high and their size differs by at most one.

    • Initially, they are empty.

    • When we add a new number, we do the following: if its less than or equal to the largest element in low, we add to low. Otherwise, we add to high. It is easy to see that the first invariant holds true. How to maintain the second invariant? If their sizes differ by 2 after insertion, we can just pop the top element from the larger heap and insert into the other one. Now their size differs by at most one. Thus, we can restore both invariants in O(log N) time when we add a new element.

    • These two invariants imply the following property: if size(low) != size(high), the median is the top element of the larger heap. If their sizes are equal, the median is the top of one them(which exactly? It depends on the definition of a median of an array with even number of elements).

kraskevich
  • 18,368
  • 4
  • 33
  • 45
  • 1
    Maybe I am missing something, but how does this give an output array B with a given property "B[i] is the median of A[0] ... A[i]"? – Arty Sep 30 '15 at 22:47
3

Did I misunderstand the question or what? Why use heaps and queues?

The median of a set of numbers is the value in the middle of the sorted set.

e.g.,

{1, 2, 3} median is 2

{1, 2, 3, 4} median is (2+3) / 2 = 2

Assume the array is sorted (if not, just sort the array, which is O(n lg n))

Time: O(n)

Space: O(1)

    int[] output = new int[input.length];

    for(int i = 0 ; i < input.length ; i++) {
        if(i % 2 == 1){
            int midPoint = i / 2;
            output[i] = (input[midPoint] + input[midPoint+1]) / 2;
        } else {
            output[i] = input[(i+1)/2];
        }
    }
    return output;

Test

input  {24, 29, 33, 40, 40, 42, 45, 47, 48, 49}
output {24, 26, 29, 31, 33, 36, 40, 40, 40, 41}

input  {12, 14, 22, 30, 33, 38, 39, 41, 43, 45}
output {12, 13, 14, 18, 22, 26, 30, 31, 33, 35}
AhDah
  • 31
  • 2
  • 1
    If the array isn't sorted your solution doesn't hold true, since you won't be able to reconstruct the output array for the unsorted set. – Vincent Nov 30 '15 at 10:03
  • 1
    This absolutely works. The heap solution is if you have a stream of integers (the question didn't specify) – lorenzocastillo Jul 05 '16 at 13:18
0

You can solve the problem in O(n log n) by using a binary search tree and augmenting it to be able to find the k-th element in O(log n) time, like described in my answer here.

For each element at index i in your array do:

B[i] = find_k_in_bst(i / 2)
insert_into_bst(A[i])

Make sure to use a balanced search tree.

If you have access to a library heap, then the heap solution described above would be the easiest. The easiest to implement yourself for this particular problem (in my opinion) is actually a segment tree: each node tells you how many elements you have inserted in its associated interval. You can use these update and query methods:

  1. Update: when inserting a value x, go down to the leaf node associated with x. Increment all counts down to it.

  2. Query: use a similar algorithm to the one I linked to for k-th element: when at a node, if count(left_child) == k - 1, then you have your answer: it must be the first element in the interval associated with the right node, and so on.

Note that this solution is O(n log V) where V is the max value in your array. To get O(n log n) you must scale the array to [1, n]: 1 100 1000 => 1 2 3. You can use a sort to do that.

Community
  • 1
  • 1
IVlad
  • 43,099
  • 13
  • 111
  • 179