I want to remove the outliers which are found by boxplot in my dataframe for each column. I know boxplot finds the outliers by IQR rule and displays them on graph. I know how to plot the boxplot using seaborn but I am unsure how can I determine exactly which rows these outliers actually refer to and how can I remove them ? Is there a function/method do to this ?
Asked
Active
Viewed 6,974 times
-1
-
possible duplicate of https://stackoverflow.com/questions/23199796/detect-and-exclude-outliers-in-pandas-data-frame – Venkatesh Garnepudi Jan 26 '19 at 07:41
1 Answers
3
According to the basic definition of IQR outliers, Values less than Q1-1.5*IQR and values greater that Q3+1.5*IQR are treated as outliers. So,
Q1 = df['col_name'].quantile(0.25)
Q3 = df['col_name'].quantile(0.75)
IQR = Q3 - Q1
Now, outliers are ,
df[(df['col_name'] < Q1-1.5*IQR ) | (df['col_name'] > Q3+1.5*IQR)]['col_name']

squaleLis
- 6,116
- 2
- 22
- 30

Venkatesh Garnepudi
- 316
- 1
- 10