3

Given:

x = [1.23, 2.0, 3.45, 4.1]

then:

middle = numpy.median(x)

Were the size of list x odd, I could use x[x.index(middle)-1] and x[x.index(middle)+1] to get the two numbers either side of the middle. This won't work in the above case, since the median is not present in x. Is there is standard approach that can handle both even-sized and odd-sized lists?

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
Pyderman
  • 14,809
  • 13
  • 61
  • 106
  • Just realized that the `.index()` approach itself would break down anyway when there are identical entries in an odd-sized list, as it will incorrectly choose the first occurrence. – Pyderman Oct 07 '15 at 21:42
  • First element greater than the median and the element immediately before it should do it, not? – vmg Oct 07 '15 at 21:46
  • "First element greater than the median and the element immediately before it" is the goal itself :) – Pyderman Oct 07 '15 at 21:51
  • Uh, sorry. I didn't see that you need the case for both even and uneven cases. – vmg Oct 07 '15 at 21:53
  • Perhaps could you specify that your array is sorted? It simplifies the problem a lot. – Régis B. Oct 08 '15 at 06:33

6 Answers6

3

Assuming that the length of the sorted input list is N then it seems to me that you want access elements N/2-1 and (N+1)/2 (assuming integer division), i.e.,

[1.23, 2.0, 3.45, 4.1] => N = 4 thus N/2-1 = 1 and (N+1)/2 = 2

[1.23, 2.0, 3.45, 4.1, 5.6] => N = 5 thus N/2-1 = 1 and (N+1)/2 = 3
ewcz
  • 12,819
  • 1
  • 25
  • 47
3

These are the numbers you are looking for:

x[(len(x)-1)/2 - len(x)%2], x[len(x)/2 + len(x)%2]
Régis B.
  • 10,092
  • 6
  • 54
  • 90
2

To get the median you have to have a sorted list so this is a simple math problem, if the list length is uneven you want the halfway point - 1 and the halfway point + 1, if the list length is even the median is the average of the two middle numbers so you want those two middle numbers.

def get_two(l):
    ln = len(l)
    half = ln // 2
    return x[half-1], x[half + ln % 2]
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
1

If the input list is unsorted (say x = [1.23, 3.45, 4.1, 2.0]) then you want to iterate through and find the two quantities of interest (this would also work for sorted inputs of course). Something like this:

largestSmallerThanMedian = x[0]
smallestLargerThanMedian = x[len(x)-1] 
for n in x:
    if (n < middle) and (n >= largestSmallerThanMedian):
        largestSmallerThanMedian = n
    if (n > middle) and (n <= smallestLargerThanMedian):
        smallestLargerThanMedian = n

And then largestSmallerThanMedian and smallestLargerThanMedian would have the two quantities of interest.

sgrg
  • 1,210
  • 9
  • 15
1

By definition, a median is a value that divides your samples into 2 halves.

The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one (e.g., the median of {3, 3, 5, 9, 11} is 5). If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values (the median of {3, 5, 7, 9} is (5 + 7) / 2 = 6).

So, you need to

  • somehow determine which of your samples are the "lower half" and which are the "upper half", then
  • select the max & min ones from the subsets, accordingly.

Possible approaches include:

  • sort the entire list (probably, in-place for efficiency), then select the corresponding elements. (O(N*log(N)))
  • traverse the list, sort the elements into "lower" and "upper" parts as you go (effectively, you'll need to calculate the median at each step to classify the next element) and keep track of your "boundary" values (you'll need them to calculate the median anyway) (O(N))

So, basically, what you need is (altered code from the linked source for your 1-D case):

if sz % 2 == 0:
    part = partition(a, ((sz // 2) - 1, sz // 2))
else:
    part = partition(a, (sz - 1) // 2)

then retrieve the corresponding elements.

Note, however, if you strive for efficiency, that there's quite an overhead converting data into ndarray.

Community
  • 1
  • 1
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
0

You can use the bisect module:

x = [1.23, 2.0, 3.45, 4.1]  

 def select_near_median(list_):
    # gets two numbers either side of median
    # in a sorted list
    # bisect finds the index where median number
    # can be inserted
    import bisect
    import statistics
    list_ = sorted(list_)
    med = statistics.median(list_)
    ind = bisect.bisect_left(list_, med)
    if med == list_[ind]:
        left = ind
        right = left + 1
    else:
        left = ind - 1
        right = ind
    return list_[left], list_[right]


        print(select_near_median([1,2,3,4]))
        (2, 3)
        print(select_near_median([1,2,3,4,5]))
        (3, 4)
        print(select_near_med(x))
        (2.0, 3.45)
LetzerWille
  • 5,355
  • 4
  • 23
  • 26
  • 2
    but the statement `med(list_)` calculates arithmetic average, not the median, doesn't it? – ewcz Oct 07 '15 at 22:06
  • @LetzerWille Thanks. For a list such as `[1,2,3,4]` though, it returns `(1,2)`. – Pyderman Oct 07 '15 at 23:13
  • @LetzerWille We're nearly there ... it's now doing something similar for an odd-sized list e.g. `select_near_median([1,2,3,4,5])` gives `(2,3)`. Note: since I'm on 2.7, I had to replace `statistics.median` with` `numpy.median`, but I don't think that makes a difference. – Pyderman Oct 08 '15 at 00:00
  • @Pyderman should work now. bisect function is counterintuitive, but with persistence :) – LetzerWille Oct 08 '15 at 00:27
  • @Pyderman bisect works on sorted input. So i have done this llPrices = sorted ([..... And got out put 783.765. Actually I have written about sorted input. Better yet I will put sorting into the function. – LetzerWille Oct 09 '15 at 03:44
  • @Pyderman x, y = select_near_median(allPrices) print(x,y) 766.92 800.61 – LetzerWille Oct 09 '15 at 03:50