0

In the new Python 3.4, they introduced a new statistics module. Among other things it has a function to calculate the median.

Currently the function first sorts the data to then determine the median. If you look at the source code it contains the remark:

# FIXME: investigate ways to calculate medians without sorting? Quickselect?

Is there a faster way to calculate the median than the function is currently using? Which algorithm should Python implement for determining the median?

hlt
  • 6,219
  • 3
  • 23
  • 43
Christian
  • 25,249
  • 40
  • 134
  • 225

2 Answers2

0

You would use a min-max-median heap to find the min, max and median in constant time (and take linear time to build the heap). if you want to know more about using heaps to get the media, read here min-max heaps

Here an example code in python

levi
  • 22,001
  • 7
  • 73
  • 74
0

The best median finding algorithm takes linear time and can be implemented as follows: Python implementation of "median of medians" algorithm

For small sets a sort and search method could be faster and reduce overhead, but this method will work best for large datasets.

The algorithm is as shown here: http://en.wikipedia.org/wiki/Selection_algorithm

Community
  • 1
  • 1