4

I am trying to build a box and whisker plots with seaborn. With my min value at '-200,000' and max value at '1,400,000'. Both of these are the outliers. However i only get the graph somewhat similar to a scatter plot.

Below is my code

import pandas as pd
import numpy as np
import xlrd

import matplotlib.pyplot as plt
import seaborn as sns

pi_analysis = pd.read_excel(r'C:\PI\PI Analysis.xlsx',
                           sheet_name = 'Raw Data'
                           , header = 0
                           )
print(pi_analysis)
group_segement= pi_analysis[['Segment', 'TOTAL AMOUNT']].groupby('Segment').sum()

print(group_segement)
group_segement_mean= pi_analysis[['Segment', 'TOTAL AMOUNT']].groupby('Segment')
group_segement_mean.mean().head()
group_segement_mean.describe()

sns.boxplot(x="Segment", y="TOTAL AMOUNT",data=pi_analysis)

enter image description here

Attached is the image. Have tried to change the access. It did not work. Any suggestions how to display the box and whiskers.

New image after changing the scale.

enter image description here

This is code section. However it still does not give me the complete view.

ax=sns.boxplot(x='Segment',y='TOTAL AMOUNT',data=pi_analysis)
ax.set_ylim(-10*10^8,10*10^8)

Regards, Ren.

Ren Lyke
  • 243
  • 2
  • 19
  • 2
    What do you see if you remove those outliers and re-plot? It looks like the boxes _may_ just be compressed due to the scale of the data – G. Anderson Nov 26 '18 at 20:06
  • @G.Anderson i made the change and still dont see any improvement. Is there a way i could change the scale so they boxes do not get compressed. – Ren Lyke Nov 27 '18 at 12:21
  • @G.Anderson made the change to scale using this `ax=sns.boxplot(x='Segment', y='TOTAL AMOUNT',data=pi_analysis, linewidth = 2.5) ax.set_ylim([-2*10^7,2*10^97)`. It does display the box. Is there a better way to do this. because i am unable to view all the data point being plotted. Converting the access to millions or billions based on the max values in Y – Ren Lyke Nov 27 '18 at 13:25
  • My question is whether a boxplot is the right representation of your data. If your range and quartiles are such that you get the result that you did, then I would consider either engineering your data to remove the outliers, or find another representation that's more fitted to your purpose – G. Anderson Nov 27 '18 at 17:06
  • @G.Anderson to find out the outliers i am trying the box plot and then wish to remove them from the data by creating a new dataframe. It is not possible for me to check over 40000 rows to figure out the outliers for each of the segment – Ren Lyke Nov 27 '18 at 18:40
  • It's up to you to decide what is or isn't an outlier. [Here](https://stackoverflow.com/questions/23199796/detect-and-exclude-outliers-in-pandas-dataframe) is a discussion of removing outliers, or you can just remove all data over/under the max/min of segment 3 – G. Anderson Nov 27 '18 at 18:43
  • @G.Anderson Thanks for the link will have a look at removing the outlier and then see what pops up. – Ren Lyke Nov 27 '18 at 18:58

2 Answers2

3

The compressed boxes you are seeing are a result of the extreme outliers being accommodated by the scaling. It's very easy to drop the outliers

Seaborn boxplots will take the matplotlib argument:

showfliers=False

This will result in plots of only the box and whiskers, with the outliers not shown.

The last line of your code would then be:

sns.boxplot(x="Segment", y="TOTAL AMOUNT",data=pi_analysis, showfliers=False) 
flashliquid
  • 500
  • 9
  • 23
0

As @g-anderson was alluding in his comment, the boxplot is there, it's just too small for you to see. Consider the following code:

d = np.random.random(size=(100,))
d[0] = 100
d[-1] = -100

fig, ax = plt.subplots()
sns.boxplot(data=d, orient='vertical')

enter image description here

If you want to see the boxplot, you could simply rescale the y-axis to a more relevant range:

fig, ax = plt.subplots()
sns.boxplot(data=d, orient='vertical')
ax.set_ylim(-1,2)

enter image description here

Diziet Asahi
  • 38,379
  • 7
  • 60
  • 75
  • 1
    thanks for the suggestion. I have made changes to my scale. The image has been added in the question. However the entire box still does not get displayed. – Ren Lyke Nov 27 '18 at 13:47
  • any way to fix it? i have tried the 2 answers here but still no box plot is showing – cryanbhu Oct 09 '19 at 06:20