45

Problem: input is a (not necessarily sorted) sequence S = k1, k2, ..., kn of n arbitrary numbers. Consider the collection C of n² numbers of the form min{ki,kj}, for 1 <=i, j<=n. Present an O(n) time and O(n) space algorithm to find the median of C.

So far I've found by examining C for different sets S that the number of instances of the smallest number in S in C is equal to (2n-1), the next smallest number: (2n-3) and so on until you only have one instance of the largest number.

Is there a way to use this information to find the median of C?

no comment
  • 6,381
  • 4
  • 12
  • 30
ejf071189
  • 613
  • 1
  • 6
  • 8
  • http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-design-and-analysis-of-algorithms-spring-2012/lecture-notes/MIT6_046JS12_lec01.pdf – quintin Jun 12 '16 at 13:00
  • 1
    similer answer : https://cs.stackexchange.com/questions/1914/to-find-the-median-of-an-unsorted-array – roottraveller Jul 29 '17 at 08:32
  • Related post here - [Calculate the median of a billion numbers](https://stackoverflow.com/q/2571358/465053) – RBT Mar 02 '18 at 14:49
  • If there was an efficient way to do this, it would make Quicksort a lot better since median of data is the ideal pivot for Quicksort. – Abhishek Choudhary Mar 18 '22 at 07:17
  • @AbhishekChoudhary How does quicksort relate to this problem with n^2 implicit numbers? And what do you mean with "efficient"? – Kelly Bundy Mar 27 '22 at 22:12
  • @KellyBundy you do realise that the only limitation of Quicksort is not finding a good pivot, and the ideal pivot for an array will be it's median, so, if there was an O(n) way of finding median (which there is), we can modify Quicksort to use median every time which would make it O(n log n) even in worst case, though it doesn't work that great in practice. (It doesn't relate to your n^2 numbers, I know, just telling Quicksort can be improved with a good median algorithm) – Abhishek Choudhary Mar 28 '22 at 03:12
  • @AbhishekChoudhary Ok, then it's just not clear why you commented that here, as the question is not about "in practice" but about O(n), and the O(n) solution to this problem (see one of the answers) doesn't seem applicable to quicksort. – Kelly Bundy Mar 28 '22 at 03:44
  • @KellyBundy I had searched linear time median algorithm on google and this appeared, that's why I thought this was same problem, and most answers below are also about linear time median algorithm, and O(n) solution to find median is applicable to quicksort, we use the median as pivot, thus ensuring O(n log n) worst case complexity. – Abhishek Choudhary Mar 28 '22 at 07:47
  • @AbhishekChoudhary Yeah, those answers are wrong. Their authors didn't understand the question. – Kelly Bundy Mar 28 '22 at 08:56

3 Answers3

20

There are a number of possibilities. One I like is Hoare's Select algorithm. The basic idea is similar to a Quicksort, except that when you recurse, you only recurse into the partition that will hold the number(s) you're looking for.

For example, if you want the median of 100 numbers, you'd start by partitioning the array, just like in Quicksort. You'd get two partitions -- one of which contains the 50th element. Recursively carry out your selection in that partition. Continue until your partition contains only one element, which will be the median (and note that you can do the same for another element of your choice).

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • But if the size of C is n^2 based on the original sequence S having n numbers, then wouldn't the run time of select performed on C be O(n^2)? – ejf071189 Nov 17 '10 at 04:15
  • Sorry -- I didn't read the question carefully enough. You're right -- this is linear on the number of items being searched in, not the number of unique items in that set. – Jerry Coffin Nov 17 '10 at 04:23
  • So can the fact that we know the number of instances of given repeated elements somehow be used in conjunction with the select algorithm? – ejf071189 Nov 17 '10 at 04:28
  • I don't think so -- the `select` algorithm starts by partitioning the elements, which means looking at all N^2 elements. – Jerry Coffin Nov 17 '10 at 04:33
  • Hmm, well it seems we need to i such that the ith element in S corresponds to the median of C. How does this sound: Since we know that the smallest element S appears in C (2n-1) times, the next smallest (2n-3) times, and so on. We can characterize the n^2 elements in C as (2n-1)+(2n-3)+(2n-5)...+1. This summation will have n total terms corresponding to the n elements of S. We know that we need to find the ith element in C where i=n^2/2. We calculate this value, compare this to (2n-1), (2n-1)+(2n-3), (2n-1)+(2n-3)+(2n-5) and so fourth until n^2/2 is less than this total... – ejf071189 Nov 17 '10 at 05:00
  • 1
    It is clear that the number of terms to achieve this is less than n/2, so we're only doing n/2 comparisons. The number of terms needed to achieve this corresponds to i, so that the ith element in S is equivalent to the median of C. We can then run select on S to find the ith element in O(n) time, so total run time is O(n). – ejf071189 Nov 17 '10 at 05:02
  • +1, it seems that at the end the algorithm reduces to the selection of the i-th element on the original unsorted sequence (which is indeed O(n) - on average). – Unreason Nov 17 '10 at 14:12
  • Hoare's algorithm is O(n^2) worst case. – domen Dec 22 '16 at 14:28
  • 1
    @domen: That's true. If you really need O(n), you probably want to use the median of medians algorithm instead (but keep in mind that it's slower on average to ensure against a worst case that rarely arises in practice). – Jerry Coffin Dec 22 '16 at 15:20
  • I'm confused. This looks like it's completely missing the question. Why is it upvoted so much? Do I just not see how it applies? – no comment Oct 13 '21 at 23:53
  • @don'ttalkjustcode: It doesn't answer what's asked in the body of the question, but does answer what's in the headline (so to speak)--the "O(n) algorithm to find the median of a collection of numbers" part. Based on the up-votes, it appears that quite a few people who find this just want to quickly find the median of some numbers, and don't really care about the extra "stuff" in the question body. In other words, a lot more people care about the simple case in the headline than the rather obscure one in the question body. – Jerry Coffin Oct 14 '21 at 08:12
  • Ah, yes, that seems likely. People who might not even read the question at all, and misinterpret the title as the n referring to the size of that collection, possibly further convinced of that by reading your answer. In fact, I found this question because [another](https://stackoverflow.com/q/33964676/16759116) was closed by four people as duplicate of this one, even though it's about median of an ordinary array. Anyway, I have to disagree with its usefulness as an answer here, sorry. – no comment Oct 14 '21 at 20:01
12

Yes, good puzzle. We can find median developing on the lines you said.

In C we have 1 occurence of max(k), 3 occurrence of next highest, 5 of next highest and so on

  1. If we ordered elements of C, number of elements on the left of mth highest number is m^2 (sum of odd numbers)

  2. The numbers that we are interested in (to calculate median) a. If n is odd is (n^2+1)/2 = alpha b. If n is even then alpha1 = n^2/2 and alpha2 = n^2/2+1 but alpha1=n^2/2 is never a square number => the number immediately on the right of alpha1 is equal to alpha1 (sum of first m odd numbers is square) => alpha1=alpha2.

  3. So it boils down to determining m such that m^2 (sum of first m odd numbers) is just higher than (n^2/2)

  4. So it boils down to determining m=ceiling(n/sqrt(2) and mth highest number in original sequence. (Whether to find mth highest or (n-m-1)th lowest is optimization).

  5. We can easily find mth highest number (just keep noting first m largest number from left) or use median of medians algortithm to do it in linear time.

Om Deshmane
  • 808
  • 6
  • 11
7

Wikipedia has a good article on Selection algorithms. If you are using C++, the STL includes a nth_element() algorithm with linear time on average.

Blastfurnace
  • 18,411
  • 56
  • 55
  • 70