How do you find the IQR in Numpy?

Question

Is there a baked-in Numpy/Scipy function to find the interquartile range? I can do it pretty easily myself, but mean() exists which is basically sum/len...

def IQR(dist):
    return np.percentile(dist, 75) - np.percentile(dist, 25)

I don't think there is a function for it, you must compute the percentiles as you did. — BrenBarn, Apr 22 '14 at 19:18

score 153 · Accepted Answer · answered Apr 22 '14 at 20:10

153

np.percentile takes multiple percentile arguments, and you are slightly better off doing:

q75, q25 = np.percentile(x, [75 ,25])
iqr = q75 - q25

or

iqr = np.subtract(*np.percentile(x, [75, 25]))

than making two calls to percentile:

In [8]: x = np.random.rand(1e6)

In [9]: %timeit q75, q25 = np.percentile(x, [75 ,25]); iqr = q75 - q25
10 loops, best of 3: 24.2 ms per loop

In [10]: %timeit iqr = np.subtract(*np.percentile(x, [75, 25]))
10 loops, best of 3: 24.2 ms per loop

In [11]: %timeit iqr = np.percentile(x, 75) - np.percentile(x, 25)
10 loops, best of 3: 33.7 ms per loop

answered Apr 22 '14 at 20:10

Jaime

65,696
17
124
159

Using the ufunc machinery, `np.substract.reduce`. IMHO, a tad clearer than the * magic. – Davidmh Jun 24 '15 at 14:27
1

@Jaime what is the * operator? what is it doing? – Sounak Jul 01 '15 at 12:14
3

It's unpacking the tuple after it, so that instead of a two item sequence, the function is passed two individual items. – Jaime Jul 01 '15 at 13:18
1

Subtracting two numbers is O(1) while finding %iles takes O(n), so unpacking the two things and very explicitly adding them is perfectly fine. – Nick T Aug 22 '15 at 01:53

Mad Physicist · Answer 2 · 2016-09-27T15:41:05.997

30

There is now an iqr function in scipy.stats. It is available as of scipy 0.18.0. My original intent was to add it to numpy, but it was considered too domain-specific.

You may be better off just using Jaime's answer, since the scipy code is just an over-complicated version of the same.

edited Sep 27 '16 at 15:41

answered Jul 27 '16 at 18:01

Mad Physicist

107,652
25
181
264

5

Why would IQR be considered too domain-specific for numpy? – Rob Rose Jan 16 '17 at 02:02
Because it is not a widely used metric. Feel free to search the mailing list for details. – Mad Physicist Jan 17 '17 at 04:03

score 2 · Answer 3 · answered Feb 13 '20 at 13:53

Ignore this if Jaime's answer works for your case. But if not, according to this answer, to find the exact values of 1st and 3rd quartiles, you should consider doing something like:

samples = sorted([28, 12, 8, 27, 16, 31, 14, 13, 19, 1, 1, 22, 13])

def find_median(sorted_list):
    indices = []

    list_size = len(sorted_list)
    median = 0

    if list_size % 2 == 0:
        indices.append(int(list_size / 2) - 1)  # -1 because index starts from 0
        indices.append(int(list_size / 2))

        median = (sorted_list[indices[0]] + sorted_list[indices[1]]) / 2
        pass
    else:
        indices.append(int(list_size / 2))

        median = sorted_list[indices[0]]
        pass

    return median, indices
    pass

median, median_indices = find_median(samples)
Q1, Q1_indices = find_median(samples[:median_indices[0]])
Q2, Q2_indices = find_median(samples[median_indices[-1] + 1:])

IQR = Q3 - Q1

quartiles = [Q1, median, Q2]

Code taken from the referenced answer.

How do you find the IQR in Numpy?

3 Answers3

Linked