0

I am looking to compliment my seaborn boxplots with a 5-number statistics summary using pandas's df.describe().

I have set my boxplot to ignore outliers. However I am not sure if df.describe() ignores outliers by default, or if I need to remove them from my DataFrame before running df.describe().

So for example, I would compute the zscores for each row of data, and then drop all rows with with a zscore higher than 3. But if pandas already does that, maybe I'm doing the same process twice?

I compared my boxplot to the output of df.describe() and I honestly can't make out the difference with the naked eye.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • 1
    Did you try the [docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html) first? – BigBen Nov 30 '21 at 18:14
  • Yes, I should added in the OP that I did read through the docs. But unless I missed it I didn't see any mention of it. – sudden_clarity_clarence Nov 30 '21 at 18:19
  • 3
    Pandas can't automatically remove outliers. The exact definition of what is considered an outlier depends on the dataset and the problem you want to solve. In some applications the outliers are the most valuable, e.g. when searching for causes of malfunctioning. – JohanC Nov 30 '21 at 18:22
  • 1
    To complete @JohanC answer there are as many ways to remove outliers as there are data analysts ;) there is a non exhaustive list of methods in the [Anomaly_detection](https://en.m.wikipedia.org/wiki/Anomaly_detection) Wikipedia page – mozway Nov 30 '21 at 18:48

0 Answers0