0

I am struggling with my homework and need a little push- the question is to design an algorithm that will in O(nlogm) time find multiple smallest elements 1<k1<k2<...<kn and you have m *k. I know that a simple selection algorithm takes o(n) time to find the kth element, but how do you reduce the m in your recurrence? I though to do both k1 and kn in each run, but that will only take out 2, not m/2.

Would appreciate some directions. Thanks

user2067051
  • 99
  • 2
  • 15
  • possible duplicate of [Write a program to find 100 largest numbers out of an array of 1 billion numbers](http://stackoverflow.com/questions/19227698/write-a-program-to-find-100-largest-numbers-out-of-an-array-of-1-billion-numbers) – Bernhard Barker Oct 09 '13 at 11:38
  • 1
    I find your problem statement really confusing. What is `m`? Why are you using little-oh in `o(n)`? – NPE Oct 09 '13 at 11:39
  • Thanks, that 100 largest number problem looks very similar. But I don't see how they come up with nlogm tim. @NPE M is basically the number of ki that we need, lest say if we need 1st,2nd and 3rd smallest numbers, m = 3. – user2067051 Oct 09 '13 at 11:50
  • The accepted answer to that question is O(n log m). – Bernhard Barker Oct 09 '13 at 11:54

1 Answers1

3

If I understand the question correctly, you have a vector K containing m indices, and you want to find the k'th ranked element of A for each k in K. If K contains the smallest m indices (i.e. k=1,2,...,m) then this can be done easily in linear time T=O(n) by using quickselect to find the element k_m (since all the smaller elements will be on the left at the end of quickselect). So I'm assuming that K can contain any set of m indices.

One way to accomplish this is by running quickselect on all of K at the same time. Here is the algorithm

QuickselectMulti(A,K)
If K is empty, then return an empty result set
Pick a pivot p from A at random
Partition A into sets A0<p and A1>p.
i = A0.size + 1
if K contains i, then remove i from K and add (i=>p) to the result set.
Partition K into sets K0<i and K1>i
add QuickselectMulti(A0,K0) to the result set
subtract i from each k in K1  
call QuickselectMulti(A1,K1), add i to each index of the output, and add this to the result set
return the result set

If K contains just one element, this is the same as randomized quickselect. To see why the running time is O(n log m) on average, first consider what happens when each pivot exactly splits both A and K in half. In this case, you get two recursive calls, so you have

T = n + 2T(n/2,m/2)
  = n + n + 4T(n/4,m/4)
  = n + n + n + 8T(n/8,m/8)

Since m drops in half each time, then n will show up log m times in this summation. To actually derive the expected running time requires a little more work, because you can't assume that the pivot will split both arrays in half, but if you work through the calculations, you will see that the running time is in fact O(n log m) on average.

On edit: The analysis of this algorithm can make this simpler by choosing the pivot by running p=Quickselect(A,k_i) where k_i is the middle element of K, rather than choosing p at random. This will guarantee that K gets split in half each time, and so the number of recursive calls will be exactly log m, and since quickselect runs in linear time, the result will still be O(n log m).

mrip
  • 14,913
  • 4
  • 40
  • 58
  • Thanks man, exactly what I needed! sorry for the bad explanation. Just one question though- why is m cut in half each time? – user2067051 Oct 09 '13 at 12:12
  • Each time, m is cut into two parts m1 and m2 so that m1+m2=m (or m1+m2=m-1 if the index i belongs to K). These correspond to the two subsets K1 and K2 of K, partitioned by the index i. Like I said, you can't assume that K will be exactly cut in half each time, so you have some more work to do, but if you run through the calculations, you'll find that the expected running time is still going to be O(n log m). – mrip Oct 09 '13 at 12:15
  • Come to think of it, you can make this simpler by choosing p by running p=Quickselect(A,k_i) where k_i is the middle element of K. This will guarantee that K gets split in half each time, and so the number of recursive calls will be exactly log m. – mrip Oct 09 '13 at 12:19
  • Thanks. One more question- add (i=>p) means to add a[i] to result right? – user2067051 Oct 09 '13 at 12:42
  • Basically, yes. I am representing the result as a map from indices to values, so that means add the pair "i maps to p" to the result map. Note that a[i]=p here. So, yes, you could just as easily forget about the indices, and just return the set of values, in which case you could just say "add p to the result set". – mrip Oct 09 '13 at 12:44