3

According to the seaborn documentation, its boxplot method makes the whiskers 1.5*IQR long. However, as seen in the plot from that documentation, this seems not to be the case. The upper and lower whiskers are not the same. Further it seems not to be 1.5 IQR.

Can someone shed some light on why they are different?

enter image description here https://seaborn.pydata.org/generated/seaborn.boxplot.html

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
Y.S
  • 43
  • 5

2 Answers2

4

In principle the assumption is correct that whiskers on the boxplots should be of equal length if they use a multiple of the interquartile range (IQR).

However there are essentially two cases where this is not true. Unfortunately the english wikipedia version does not tell those reasons, but let me translate the explanation from the german wikipedia:

Whisker
One possible definition, originating from John W. Tukey, is to restrict the length of the whisker to maximally 1.5 times the inter quartile range (1.5*IQR).

In this case the whisker does however not end exactly at this value, but rather at the value from the data which still lies inside of this boundary. The length of the whisker is hence determined by the data and not solemnly by the inter quartile range. This is the reason why the whisker does not need to be of the same size on both ends of the box. If there are no values outside of the 1.5*IQR boundary, the length of the whisker is determined by the minimal and maximal value. Otherwise, the values outside of the whiskers are marked separately in the diagram; those values can then be treated as outliers.

A plot from the same wikipedia page might make this more obvious:

enter image description here

In case of the diagram shown in the question the second reason most certainly applies: Namely that the lower whisker ends at the position of the lowest data value.

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
0

matplotlib allows for individual error bars (I assume that's what you mean by 'whiskers'). Here is the page on matplotlib: https://matplotlib.org/1.2.1/examples/pylab_examples/errorbar_demo.html

You can explicitly define the error bars by using xerr and yerr: "xerr/yerr : scalar or array-like, shape(N,) or shape(2,N), optional

If a scalar number, len(N) array-like object, or a N-element array-like object, errorbars are drawn at +/-value relative to the data. Default is None.

If a sequence of shape 2xN, errorbars are drawn at -row1 and +row2 relative to the data."

...and plug them into their respective positions in matplotlib.axes.Axes.errorbar

Axes.errorbar(x, y, yerr=None, xerr=None, fmt='', ecolor=None, elinewidth=None, capsize=None, barsabove=False, lolims=False, uplims=False, xlolims=False, xuplims=False, errorevery=1, capthick=None, *, data=None, **kwargs)

page: https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.errorbar.html

If you are interested in making the error bars different in the +y and -y directions, then you can plot twice on the same figure where the second plot has no markers except for the error bars, and the center of those error bars is the mean between the +y and -y values.

Rockethrob
  • 21
  • 5
  • The question asks why the two whiskers (errorbar lines) are not equally long but are different in length. – ImportanceOfBeingErnest Mar 06 '18 at 20:29
  • Indeed, I reread and considered that he may be asking that since the question is vague. I have edited my response to cover that as well. – Rockethrob Mar 06 '18 at 20:31
  • I mean, everything you write here is correct, but the question asks why in the plot shown (which is copied from the documentation), the two whisker lines are different, given that they should be 1.5*IQR long on both sides of the box. – ImportanceOfBeingErnest Mar 06 '18 at 20:33
  • Maybe you're right. If that is the question, it would be helpful to see code. It was my impression he or she was attempting to recreate a plot not correct a deformation in his/her own work. – Rockethrob Mar 06 '18 at 20:39
  • It's neither. 1.5*IQR is a single number, hence you would expect the box plot to have equally sized whiskers, completely independent on the data used to create it. The question is why is this not the case in the plots. The code to create them is in the linked documentation. – ImportanceOfBeingErnest Mar 06 '18 at 20:45