0

I want to find all the outliers in a dataframe and replace them by the mean of the variable (column).

This is a big dataframe, composed of 46 obs. of 147 variables.

I was thinking of doing somethings like

new_df <- for (i in scaled.df){
  i[!i %in% boxplot.stats(i)$out]

And then replace NULL values, but that function creates a NULL object, I believe the reason is that the new vectors created won´t have the same length.

Any ideas? Thx

  • beware: changing the value of outliers might change the properties of your data (and thus also of the mean/median/sd/etc..)... – Wimpel Jun 01 '21 at 11:33

1 Answers1

0

You can write a function to do this -

replace_outlier_with_mean <- function(x) {
  replace(x, x %in% boxplot.stats(x)$out, mean(x))  
}

To apply for multiple columns you can use lapply -

scaled.df[] <- lapply(scaled.df, replace_outlier_with_mean)

Or in dplyr -

library(dplyr)
scaled.df %>% mutate(across(.fns = replace_outlier_with_mean))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213