0

boxplot for the data

plot = sns.boxplot(y = 'charges', x = 'smoker' , data = df)

I was trying to remove outliers for the "no" category here. For which I filtered the no values

no_charges = d[d.smoker == 'no'].charges

Calculated the IQR

Q1 = no_charges.quantile(0.25)
Q3 = no_charges.quantile(0.75)
IQR = Q3-Q1

Filtered the outliers

da = df.copy()
no = da[da.smoker == 'no']
rem = no[(no.charges > (Q3 + 1.5*IQR)) | (no.charges < (Q3 - 1.5*IQR))]

But I am facing issue when I try to drop the desired data from the main table

da.drop[[(no.charges > (Q3 + 1.5*IQR)) | (no.charges < (Q3 - 1.5*IQR))], inplace=True]

Can we do it this way? If not then how to filter the values?

  • 1
    what issue you are facing ? it would be better if you include error message as well. – Wickkiey May 13 '20 at 03:58
  • File "", line 1 da.drop[[(no.charges > (Q3 + 1.5*IQR)) | (no.charges < (Q3 - 1.5*IQR))], inplace=True] ^ SyntaxError: invalid syntax It shows syntax error at (inplace = True), but it is correct syntax i believe. – Priyanshi Tyagi May 13 '20 at 05:28

1 Answers1

0

To remove the outliers

df = da[~((no_charges < (Q1 - 1.5 * IQR)) |(no_charges > (Q3 + 1.5 * IQR))).any(axis=1)]

You can refer this answer for detailed explanation.

https://stackoverflow.com/a/50461938/1727543

Wickkiey
  • 4,446
  • 2
  • 39
  • 46