Why is my distribution so strange in sns.boxplot?

Question

So i am trying to plot the distribution of my dataset, in order to find outliers using IQR. However instead of displaying say (0 to 100000) the x axis scale is from 0 to 1, with almost all of the data clustered at 0, despite me having removed all null values. Could someone please explain where i have gone wrong and why the scale of my plot is only 0 to 1, below is the full code and an image of the plot. The dataset has an IQR of 51770 so this scale of 0 - 1 cannot be right, or is a reduced version.

This also is not particularly useful or correct as instead of having an outlier list with say 10 IQR values there are too many to count.

import warnings
warnings.simplefilter(action = 'ignore', category = FutureWarning)

from IPython.display import display

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt

pd.options.mode.chained_assignment = None

invest_2019 = pd.read_csv("Investment_2019.csv")
invest_2019['Investment2019'][invest_2019['Investment2019'] < 0] = np.nan
invest_2019.dropna(inplace = True)

invest_2019.isnull().sum()

x = invest_2019['Investment2019']

# Detect Outliers:
sns.boxplot(x)  # initial plot
plt.show()

Q1 = x.quantile(0.25)
Q3 = x.quantile(0.75)
IQR = Q3 - Q1
print("IQR: ", IQR, "\n")

b = Q1 - (1.5*IQR)
t = Q3 + (1.5*IQR)
r = t-b
print("bottom shadow:", b)
print("top shadow", t)
print("range: ", r, "\n")

outlie = x[(x < (Q1 - 1.5 * IQR)) | (x > (Q3 + 1.5 * IQR))]

outlie

Boxplot

Would you be so kind as to share how i can supress this via seaborn as this appears to be for matplotlib and i am unsure of how to do it. — TheRealExodus, Feb 02 '21 at 18:35
Create your axes with matplotlib and pass it to `sns.boxplot`, which has an `ax` parameter. — BigBen, Feb 02 '21 at 18:38
Apologies but i have very limited experience with plotting so i am still unsure as to how to do this. Thank you though! — TheRealExodus, Feb 02 '21 at 18:41
or `ax = sns.boxplot(x) ` and `ax.ticklabel_format(useOffset=False, style='plain')`. You can further add `ax.set_xlim(xmax=2000000)` to hide the largest of the outliers (they then will be out of view). — JohanC, Feb 02 '21 at 22:07

Why is my distribution so strange in sns.boxplot?

0 Answers0