Checking a Pandas Dataframe for Outliers

Question

I have an experiment on a sensor that contains 8 electrodes. The image above is a plot of the electrode output vs time. As you can see on the plot, one of the 8 electrodes is clearly an outlier (probably due to some electrical failure). The plot is generated from a Pandas DataFrame, which essentially has 10 columns (1 for time, 8 for the electrodes, and 1 averaging the 8 electrodes).

What is the best way to statistically detect that one of the columns is an outlier? I imagine the outlier column can then just be dropped from the dataframe.

Thanks!

https://stackoverflow.com/questions/23199796/detect-and-exclude-outliers-in-pandas-dataframe — skrubber, Jan 04 '18 at 01:43

Shahebaz Mohammad · Accepted Answer · 2018-06-06T07:09:43.713

Scatter plots or distribution plots are good for pointing outliers. But in context to the question of pandas data frames here's how I would do it.

df.decribe()

Will give you a good matrix of mean, max, and all percentile. Look into the max of the column to point out the outlier if its greater than 75 percentile of values.

Then df['Sensor Value'].value_counts()should give you the frequency of the values. You will have the outliers shown right here with greater values and that of less frequency.

Get their indexes and just drop them using df.drop(indexes_list, inplace=True)

EDIT: You could also check outlier with mean +/- 3 * standard deviation.

Example code:

outliers = df[df[col] > df[col].mean() + 3 * df[col].std()]

Checking a Pandas Dataframe for Outliers

1 Answers1

Linked