I built this custom "winsorize" function that does what it should, unless there are NA's in the data.
How it works:
winsor1 <- function(x, probability){
numWin <- ceiling(length(x)*probability)
# Replace first lower, then upper
x <- pmax(x, sort(x)[numWin+1])
x <- pmin(x, sort(x)[length(x)-numWin])
return(x)
}
x <- 0:10
winsor1(x, probability=0.01)
[1] 1 1 2 3 4 5 6 7 8 9 9
So it replaces the top (and bottom) 1% of the data (rounded up to the next value, since there are only 11 values in the example). If there are, e.g., 250 values then the bottom 3 and top 3 values would be replaced by the bottom 4th and top 4th respectively.
The whole thing breaks down when there are NA's in the data, causing an error. However, if I set na.rm = TRUE
in the pmax()
and pmin()
then the NA's
themselves are replaced by the bottom value.
x[5] <- NA
winsor1(x, probability=0.01)
[1] 1 1 2 3 1 5 6 7 8 9 9
What can I do so that the NA's
are preserved but do not cause an error? This is the output I want for the last line:
winsor1(x, probability=0.01)
[1] 1 1 2 3 NA 5 6 7 8 9 9