3

I am trying to apply the Winsorize() function using lapply from the library(DescTools) package. What I currently have is;

data$col1 <- Winsorize(data$col1)

Which essentially replaces the extreme values with a value based on quantiles, replacing the below data as follows;

> data$col1
 [1]   -0.06775798   **-0.55213508**   -0.12338265
 [4]    0.04928349    **0.47524313**    0.04782829
 [7]   -0.05070639 **-112.67126382**    0.12657896
[10]   -0.12886632

> Winsorize(data$col1)
 [1] -0.06775798 **-0.37884540** -0.12338265  0.04928349
 [5]  **0.26038103**  0.04782829 -0.05070639 **-0.37884540**
 [9]  0.12657896 -0.12886632

I have a for loop which can do this across all columns of the data.frame col1, col2, col3, col4, however, I know lapply is a better option so I am trying to incorporate it into an lapply function but cannot seem to get it working. If anybody can point me in the right direction it would be much apreciated.

The data;

data <- structure(list(EQ.TA = c(-0.0677579847115102, -0.552135083517749, 
-0.123382654164705, 0.0492834931482554, 0.475243125304193, 0.0478282913638668, 
-0.050706389027946, -112.671263815473, 0.126578956975704, -0.128866322940619
), NI.EQ = c(3.64670235329765, 1.66115713369585, 0.209424623633739, 
0.340430636358184, -0.248411254566261, -12.1709277350516, 1.06888235737433, 
0.0515582237132515, 0.177323118521857, 0.419879195374698), NI.TA = c(-0.24709320230217, 
-0.917183132749265, -0.0258393659113752, 0.0167776109344148, 
-0.118055740980805, -0.582114677880617, -0.0541991646381309, 
-5.80913022585296, 0.0224453753901758, -0.0541082879872031), 
    TL.TA = c(1.06775798471151, 1.55213508351775, 1.12338265416471, 
    0.950716506851745, 0.524756874695807, 0.952171708636133, 
    1.05070638902795, 113.671263815473, 0.873421043024296, 1.12886632294062
    )), .Names = c("EQ.TA", "NI.EQ", "NI.TA", "TL.TA"), row.names = c(NA, 
10L), class = "data.frame")
zx8754
  • 52,746
  • 12
  • 114
  • 209
user113156
  • 6,761
  • 5
  • 35
  • 81

2 Answers2

6

You can lapply over the whole data.frame and reassign it like:

library(DescTools)
data[]<-lapply(data, Winsorize)

data
#          EQ.TA       NI.EQ       NI.TA      TL.TA
#1   -0.06775798  2.75320700 -0.24709320  1.0677580
#2   -0.55213508  1.66115713 -0.91718313  1.5521351
#3   -0.12338265  0.20942462 -0.02583937  1.1233827
#4    0.04928349  0.34043064  0.01677761  0.9507165
#5    0.31834425 -0.24841125 -0.11805574  0.6816558
#6    0.04782829 -6.80579532 -0.58211468  0.9521717
#7   -0.05070639  1.06888236 -0.05419916  1.0507064
#8  -62.21765589  0.05155822 -3.60775403 63.2176559
#9    0.12657896  0.17732312  0.01989488  0.8734210
#10  -0.12886632  0.41987920 -0.05410829  1.1288663
Mike H.
  • 13,960
  • 2
  • 29
  • 39
  • Ah it was that simple... So if I wanted to apply some of the package setings it would simply be `data[] <- lapply(data, Winsorize(minval = NULL, maxval = NUL, probs = c(0.05, 0.95), na.rm = FALSE)` ? using the settings from the documentation `Winsorize(x, minval = NULL, maxval = NULL, probs = c(0.05, 0.95), na.rm = FALSE)` – user113156 May 02 '18 at 20:21
  • 2
    @user113156 You could do either `data[]<-lapply(data, Winsorize, minval = NULL, maxval = NULL, probs = c(0.05, 0.95), na.rm = FALSE)` or `data[]<-lapply(data, function(x) Winsorize(x, minval = NULL, maxval = NULL, probs = c(0.05, 0.95), na.rm = FALSE))` – Mike H. May 02 '18 at 20:24
3

I like the answers above. But for a recent research project I had a data frame with variables of different types. I only want to winsorize numeric variables at the 1%-level using lapply keeping NA values. Extending the answer above I think the following might be a suitable extension:

library(DescTools)

wins_vars <- function(x, pct_level = 0.01){
    if(is.numeric(x)){
      Winsorize(x, probs = c(pct_level, 1-pct_level), na.rm = T)
      } else {x}
    }

df <- bind_cols(
  lapply(df, wins_vars))

ToWii
  • 590
  • 5
  • 8