1

I have been trying to replace outliers 1.5*IQR +/- upper/lower quantile by the upper and lower quantile with the following code:

`lower.quantile <- as.numeric(summary(loans$dINC_A)[2])
 lower.quantile
[1] 9000  
upper.quantile <- as.numeric(summary(loans$dINC_A)[5])
> upper.quantile
[1] 21240
IQR <- upper.quantile - lower.quantile
# I replace outliers by the lower/upper bound values
loans$INC_A[ loans$dINC_A < (lower.quantile-1.5*IQR) ] <- lower.quantile
loans$INC_A[ loans$dINC_A > (upper.quantile+1.5*IQR) ] <- upper.quantile`

Moreover:

> upper.quantile+1.5*IQR
[1] 39600
> lower.quantile-1.5*IQR
[1] -9360

However, once I recheck the summary() of my variable, my I get that my maximal value remains 64800>upper.quantile+1.5*IQR=39600

> summary(loans$dINC_A)

Min. 1st Qu. Median Mean 3rd Qu. Max. 0 9000 19500 21240 30600 64800

What is missing in my R code ?

Lili Matic
  • 59
  • 4
  • As said, max=64800>upper.quantile+1.5*IQR=39600 remains in the summary, yet I want to replace the value by `loans$INC_A[ loans$dINC_A > (upper.quantile+1.5*IQR) ] <- upper.quantile` – Lili Matic Oct 31 '16 at 21:17
  • Reproducible example? [See tips here](http://stackoverflow.com/q/5963269/903061). – Gregor Thomas Oct 31 '16 at 21:27
  • When you replace, you use `loans$INC_A` at the start of the line and `loans$dINC_A` in the middle - then you re-check `loans$dINC_A`. I think the missing `d` in `$INC_A` at the start or your line may be the problem. – Gregor Thomas Oct 31 '16 at 21:30
  • yup that was the problem, thx! – Lili Matic Oct 31 '16 at 21:59
  • Possible duplicate of [How to replace outliers with the 5th and 95th percentile values in R](http://stackoverflow.com/questions/13339685/how-to-replace-outliers-with-the-5th-and-95th-percentile-values-in-r) – coatless Nov 01 '16 at 00:36
  • This is known as winsorization – Chris Dec 28 '17 at 21:42

0 Answers0