13

I am plotting a non-normal distribution using boxplot and interested in finding out about outliers using boxplot function of matplotlib.

Besides the plot I am interested in finding out the value of points in my code which are shown as outliers in the boxplot. Is there any way I can extract these values for use in my downstream code from the boxplot object?

Hooked
  • 84,485
  • 43
  • 192
  • 261
Abhi
  • 6,075
  • 10
  • 41
  • 55

2 Answers2

21

Do you means those points above and below the two black lines?

from pylab import *
spread= rand(50) * 100
center = ones(25) * 50
flier_high = rand(10) * 100 + 100
flier_low = rand(10) * -100
data =concatenate((spread, center, flier_high, flier_low), 0)
r = boxplot(data)

enter image description here

Store the return dict from boxplot, and you can get the all the information from it, for example:

top_points = r["fliers"][0].get_data()[1]
bottom_points = r["fliers"][2].get_data()[1]
plot(np.ones(len(top_points)), top_points, "+")
plot(np.ones(len(bottom_points)), bottom_points, "+")

enter image description here

HYRY
  • 94,853
  • 25
  • 187
  • 187
  • 1
    What is the logic here for extracting outliers from the different indices of `['fliers']`? why is 0 = above and 2 = below? And why the [1] after `get_data()`? Sorry, I'm trying to dynamically replicate this for 20+ boxes on the same chart and I cant seem replicate and don't want to open a similar question – DJK Jul 07 '18 at 20:28
  • 3
    currently `r["fliers"][2].get_data()[1]` returns `IndexError: list index out of range`, whereas `r["fliers"][0].get_data()[1]` returns **both top and bottom outliers**, so use this to retrieve all outliers at once. – pcko1 Feb 26 '19 at 09:54
1

The matplotlib pyplot.boxplot() function returns a dictionary containing various properties of the boxplot. The outlier values are stored within the fliers key of this dictionary.

Assuming call to plt.boxplot() was stored in variable bplot,

# retrieving outliers for vertical boxplot
outliers = bplot["fliers"][0].get_ydata()

# retreiving outliers for horizontal boxplot
outliers = bplot["fliers"][0].get_xdata()