1

I'm using the dataset House Prices: Advanced Regression Techniques, which includes multiple factor variables that have NA's among their levels. Consider the columns PoolQL, Alley and MiscFeatures. I want to replace for all these NA's with None in one function, but I fail to do so. Tried this so far:

MissingLevels <- function(x){
  for(i in names(x)){
  levels <- levels(x[i])
  levels[length(levels) + 1] <- 'None'
  x[i] <- factor(x[i], levels = levels)
  x[i][is.na(x[i])] <- 'None'
  return(x)
  }
}

MissingLevels(df[,c('Alley', 'Fence')])

apply(df[,c('Alley', 'Fence')], 2, MissingLevels)

https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data

Edgar Santos
  • 3,426
  • 2
  • 17
  • 29
cappuccino
  • 325
  • 3
  • 13

1 Answers1

2

There are several ways e.g.:

x <- data.frame(another = 1:3, Alley = c("A", "B", NA), Fence = c("C", NA, NA))

Option 1: using forcats package

x[,c("Alley", "Fence")] <- lapply(x[,c("Alley", "Fence")], fct_explicit_na, na_level = "None")

  another Alley Fence
1       1     A     C
2       2     B  None
3       3  None  None

Option 2:

x[,c("Alley", "Fence")] <- lapply(x[,c("Alley", "Fence")], function(x){`levels<-`(addNA(x), c(levels(x), "None"))})

PS: The second answer is inspired in @G. Grothendieck post replace <NA> in a factor column in R

Edgar Santos
  • 3,426
  • 2
  • 17
  • 29
  • 1
    Sweet! Very concise functions. I like it. For me,this works very neatly: apply(df[,c("Alley", "Fence")], 2, fct_explicit_na, na_level = "None"). – cappuccino Jul 03 '17 at 22:19
  • 1
    Why use `sapply` or `apply(df, 2...` and coerce the data.frame to a matrix? `lapply` would be more appropriate, when combined with overwriting the original `x` – thelatemail Jul 03 '17 at 22:34
  • I think you are right @thelatemail. lapply preserves the class factor, whereas apply changes the variables to character. – cappuccino Jul 03 '17 at 22:39
  • True @thelatemail. Thanks . Edited. – Edgar Santos Jul 03 '17 at 23:02