19

I have data like this.

Ram,500
Sam,400
Test,100
Ram,800
Sam,700
Test,300
Ram,900
Sam,800
Test,400

What is the shortest way to fine the "median" from above data. My result should be something like...

Median = 1/2(n+1), where n is the number of data values in the sample.

Test 500
Sam 700
Ram 800
bmu
  • 35,119
  • 13
  • 91
  • 108
user1335606
  • 479
  • 2
  • 5
  • 14
  • If you're just looking for median algorithms try [this one](http://stackoverflow.com/questions/7578689/median-code-explanation) – John Mee Dec 07 '12 at 04:20

6 Answers6

40

Python 3.4 includes statistics built-in, so you can use the method statistics.median:

>>> from statistics import median
>>> median([1, 3, 5])
 3
jabaldonedo
  • 25,822
  • 8
  • 77
  • 77
35

Use numpy's median function.

KurzedMetal
  • 12,540
  • 6
  • 39
  • 65
31

Its a little unclear how your data is actually represented, so I've assumed it is a list of tuples:

data = [('Ram',500), ('Sam',400), ('Test',100), ('Ram',800), ('Sam',700), 
        ('Test',300), ('Ram',900), ('Sam',800), ('Test',400)]

from collections import defaultdict

def median(mylist):
    sorts = sorted(mylist)
    length = len(sorts)
    if not length % 2:
        return (sorts[length / 2] + sorts[length / 2 - 1]) / 2.0
    return sorts[length / 2]

data_dict = defaultdict(list)
for el in data:
    data_dict[el[0]].append(el[1])

print [(key,median(val)) for key, val in data_dict.items()] 
print median([5,2,4,3,1])   
print median([5,2,4,3,1,6])
#output:
[('Test', 300), ('Ram', 800), ('Sam', 700)]
3
3.5

The function median returns the median from a list. If there are an even number of entries it takes the middle value of the middle two entries (this is standard).

I've used defaultdict to create a dict keyed by your data and their values, which is a more useful representation of your data.

fraxel
  • 34,470
  • 11
  • 98
  • 102
  • 1
    Maybe the function would be a little clearer if you factor out `n = len(sorts)` – John La Rooy May 07 '12 at 22:31
  • median() crashes on empty lists, you might want to add `if not mylist: return 0` at the beginning. – OlivierBlanvillain Oct 17 '13 at 09:52
  • 7
    @OlivierBlanvillain it doesn't crash but it raises an exception, which you can catch. this is correct behaviour, as the median of an empty list is *undefined* and definitely not "0" (which is the median of something like `[2,-1,0]`) – umläute Oct 17 '13 at 11:49
  • 5
    I guess it depends on how you look at it. Anyway having to catching an "IndexError" for an undefined value does not seem very idiomatic to me. Maybe raising a ValueError, or returning a None... – OlivierBlanvillain Oct 17 '13 at 13:09
4

Check this out:

def median(lst):
    even = (0 if len(lst) % 2 else 1) + 1
    half = (len(lst) - 1) / 2
    return sum(sorted(lst)[half:half + even]) / float(even)

Note:

sorted(lst) produces a sorted copy of lst;

sum([1]) == 1;

WhyWhat
  • 250
  • 1
  • 9
0

Easiest way to get the median of a list with integer data:

x = [1,3,2]
print "The median of x is:",sorted(x)[len(x)//2]
user3100512
  • 49
  • 1
  • 1
0

I started with user3100512's answer and quickly realized it doesn't work for an even number of items. I added some conditionals to it to compute the median.

def median(x):
    if len(x)%2 != 0:
        return sorted(x)[len(x)/2]
    else:
        midavg = (sorted(x)[len(x)/2] + sorted(x)[len(x)/2-1])/2.0
        return midavg

    median([4,5,6,7])

should return 5.5

Ben
  • 9
  • 1