0

I want to compare a data point with a boxplot. The plot shows me that the point is an outlier while when I calculated the lower (Q1-1.5IQR) and upper (Q3+1.5IQR) bounds, I found that the point shouldn't be an outlier. How can I plot the whiskers of my boxplot to the lower and upper bound? The lower_allowed_limit is 0.03658130324999995 but the value of my data point is 0.077414939.

enter image description here

import numpy as np
import pandas as pd
import seaborn as sns

data = {'Center': [0.26857881, 0.25526688, 0.27575725, 0.25816567, 0.27839591, 0.25704223, 0.27876854,
                   0.25689208, 0.24568341, 0.27170329, 0.24951651, 0.26341556, 0.23330706, 0.26628367,
                   0.24271549, 0.26151005, 0.2194737, 0.22327154, 0.20390595, 0.23187327, 0.22729037,
                   0.24013772, 0.21812675, 0.24829089, 0.22292593, 0.21533798, 0.2375203, 0.24358,
                   0.23935491, 0.21955327, 0.25495809, 0.22418302, 0.23168249, 0.22031974, 0.19427523,
                   0.21487167, 0.22327235, 0.2334787, 0.21033593, 0.21257535, 0.19528684, 0.20617609,
                   0.21584015, 0.20789747, 0.20814228, 0.21319687, 0.22782205, 0.20793727, 0.22442598,
                   0.22854906, 0.2328641, 0.23012169, 0.22649554, 0.21118934, 0.1980712, 0.20799725, 0.19317045,
                   0.20426635, 0.18019197, 0.18720767, 0.18573065, 0.19297233, 0.19230893, 0.20739394, 0.20091433,
                   0.20808377, 0.21997453, 0.21432004, 0.2145078, 0.22670847, 0.2195935, 0.21880863, 0.22720217,
                   0.2405563, 0.24436345, 0.22278882, 0.23400447, 0.2128166, 0.2344685, 0.21768168, 0.23376607,
                   0.21818298, 0.234481, 0.22627898, 0.29847948, 0.23503119, 0.2308161, 0.19334404, 0.09538444,
                   0.11038104, 0.11524165, 0.09856957, 0.12746572, 0.1558064, 0.15951489, 0.12194555, 0.11833563,
                   0.1014696, 0.11559569, 0.11146258, 0.1315745, 0.09123692, 0.09131569, 0.10194323, 0.11820753,
                   0.11682336, 0.12112035, 0.15115081, 0.08319368, 0.10231317, 0.09921997, 0.12303618, 0.10655393,
                   0.11144342, 0.09901149, 0.09016768, 0.09824667, 0.12649639, 0.14759194, 0.14387346]}

X = pd.DataFrame(data)

def is_tukey(x, y, k=1.5):
    first_quartile = np.quantile(x, .25)
    third_quartile = np.quantile(x, .75)
    iqr = third_quartile - first_quartile
    lower_allowed_limit = first_quartile - (k * iqr)
    upper_allowed_limit = third_quartile + (k * iqr)
    in_range = (y > lower_allowed_limit) and (y < upper_allowed_limit)
    return in_range, lower_allowed_limit

y = 0.077414939

is_tukey(X['Center'], y)

sns.boxplot(X['Center'])
sns.swarmplot([y], size=5, color='r')
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
8Simon8
  • 141
  • 9
  • Please see [`matplotlib.pyplot.boxplot`](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html) & [Boxplot Demo](https://matplotlib.org/stable/gallery/pyplots/boxplot_demo_pyplot.html), because seaborn is just an api for matplotlib. The whiskers are drawn as _**The default value of whis = 1.5 corresponds to Tukey's original definition of boxplots.**_. However, the whisker length will only be drawn to the min / max value if there are no outliers. If `y = 0.077414939` is in the data, which it's not, then the whisker will extend to that point. – Trenton McKinney Jul 25 '22 at 20:31
  • @TrentonMcKinney Thanks for your response. Actually, I want to plot the boxplot base on X and then see whether my new data point y is in the range [(Q1-1.5IQR), (Q3+1.5IQR)] or not. See lower and upper fence in [this post](https://stackoverflow.com/questions/17725927/boxplots-in-matplotlib-markers-and-outliers) – 8Simon8 Jul 25 '22 at 22:54
  • I understood, but my point is, that is not how the API works. The whiskers are drawn according to the data passed to the plot function. – Trenton McKinney Jul 25 '22 at 22:57
  • You are right, but how can I do that? Do you have an idea? – 8Simon8 Jul 25 '22 at 22:58
  • Mostly there is the `whis=` parameter, as shown in the duplicates. There is also this [answer](https://stackoverflow.com/a/61017306/7758804) in a duplicate. You would need to loop over three boxes though. – Trenton McKinney Jul 25 '22 at 23:16
  • 1
    The last sentence should be, "You _wouldn't_ need to...". – Trenton McKinney Jul 25 '22 at 23:28

0 Answers0