By definition, a median is a value that divides your samples into 2 halves.
The median of a finite list of numbers can be found by arranging all
the observations from lowest value to highest value and picking the
middle one (e.g., the median of {3, 3, 5, 9, 11} is 5). If there is an
even number of observations, then there is no single middle value; the
median is then usually defined to be the mean of the two middle values
(the median of {3, 5, 7, 9} is (5 + 7) / 2 = 6).
So, you need to
- somehow determine which of your samples are the "lower half" and which are the "upper half", then
- select the max & min ones from the subsets, accordingly.
Possible approaches include:
- sort the entire list (probably, in-place for efficiency), then select the corresponding elements. (O(N*log(N)))
- traverse the list, sort the elements into "lower" and "upper" parts as you go (effectively, you'll need to calculate the median at each step to classify the next element) and keep track of your "boundary" values (you'll need them to calculate the median anyway) (O(N))
So, basically, what you need is (altered code from the linked source for your 1-D case):
if sz % 2 == 0:
part = partition(a, ((sz // 2) - 1, sz // 2))
else:
part = partition(a, (sz - 1) // 2)
then retrieve the corresponding elements.
Note, however, if you strive for efficiency, that there's quite an overhead converting data into ndarray
.