1

I have a homework assignment that I was doing with Minitab to find quartiles and the interquartile range of a data set. When I tried to replicate the results using NumPy, the results were different. After doing some googling, I see that there are many different algorithms for computing quartiles: as listed here. I've tried all the different types of interpolation listed in the NumPy docs for the percentile function but none of them match minitab's algorithm. Is there any lazy solution to achieve the minitab algorithm with NumPy or will I just need to roll out my own code and implement the algorithm?

Sample code:

import pandas as pd
import numpy as np

terrestrial = Series([76.5,6.03,3.51,9.96,4.24,7.74,9.54,41.7,1.84,2.5,1.64])
aquatic = Series([.27,.61,.54,.14,.63,.23,.56,.48,.16,.18])

df = DataFrame({'terrestrial' : terrestrial, 'aquatic' : aquatic})

This is the method I used with NumPy

q75,q25 = np.percentile(df.aquatic, [75,25], interpolation='linear')
iqr = q75 - q25

The results from Minitab are different:

Descriptive Statistics: aquatic, terrestrial 

Variable         Q1      Q3     IQR
aquatic      0.1750  0.5725  0.3975
terrestrial    2.50    9.96    7.46
Roly
  • 1,516
  • 1
  • 15
  • 26
  • 1
    Can you provide example input/output? – Marius Jun 02 '15 at 03:29
  • Is there any documentation explaining what algorithm minitab uses? In particular, how does it handle `NaN`? Your two columns have different lengths, so `aquatic` is padded with a `NaN` at the end. – BrenBarn Jun 02 '15 at 03:40
  • The link included in the question has an entry for it. This other article confirms it: [here](http://mathforum.org/library/drmath/view/60969.html) – Roly Jun 02 '15 at 03:43

2 Answers2

1

Here's an attempt to implement Minitab's algorithm. I've written these functions assuming that you've already dropped missing observations from the series a:

# Drop missing obs
x = df.aquatic[~ pd.isnull(df.aquatic)]

def get_quartile1(a):
    a = a.sort(inplace=False)
    pos1 = (len(a) + 1) / 4.0
    round_pos1 = int(np.floor((len(a) + 1) / 4.0))
    first_part = a.iloc[round_pos1 - 1]
    extra_prop = pos1 - round_pos1
    interp_part = extra_prop * (a.iloc[round_pos1] - first_part)
    return first_part + interp_part

get_quartile1(x)
Out[84]: 0.17499999999999999

def get_quartile3(a):
    a = a.sort(inplace=False)
    pos3 = (3 * len(a) + 3) / 4.0
    round_pos3 = round((3 * len(a) + 3) / 4) 
    first_part = a.iloc[round_pos3 - 1]
    extra_prop = pos3 - round_pos3
    interp_part = extra_prop * (a.iloc[round_pos3] - first_part)
    return first_part + interp_part

get_quartile3(x)
Out[86]: 0.57250000000000001
Marius
  • 58,213
  • 16
  • 107
  • 105
0

I think you will have to roll your own. The interpolation methods provided by np.percentile only affect how the interpolation is done between the nearest data points around the quantile position. But it appears that minitab is actually using a different method for determining the quantile position in the first place.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384