0

I have my code which shows stats:

data = data.assign(
    ArrDelay=np.where(data["ArrDelay"].lt(0), 0, data["ArrDelay"]),
    DepDelay=np.where(data["DepDelay"].lt(0), 0, data["DepDelay"])
)
data[["ArrDelay", "DepDelay"]].head(40)
data['Month'] = (data['ArrDelay'] + data['DepDelay'])

result = data.groupby("UniqueCarrier")["Month"].mean()
print(result)
sns.boxplot(x='UniqueCarrier', y='Month', data=data, order=result.index)

But the boxplot is incorrect.

There is my result: enter image description here

How I'd like it to be: enter image description here

JohanC
  • 71,591
  • 8
  • 33
  • 66
Eug
  • 3
  • 2
  • I have fixed description – Eug Jan 05 '23 at 16:05
  • 2
    The boxplot looks like this because you have a huge number of values close to zero, and a smaller, but still large, number of outliers. Maybe a log-scaled y-axis would help a bit? And/or using `sns.boxenplot` to show more detail. You could further try to set the zero values in the `Month` column to `np.nan`, so they don't get counted. – JohanC Jan 05 '23 at 16:11
  • So, did you exclude the latency values above the 95th percentil? You could `plt.ylim(ymax=100)` to hide the large outliers. – JohanC Jan 05 '23 at 16:25

1 Answers1

0

you should remove the outliers with the showfliers option:

#... 
sns.boxplot(x='UniqueCarrier', y='Month', data=data, order=result.index, showfliers = False)
Mat.B
  • 336
  • 2
  • 8