0

i have a data frame(called hp) what contains more columns with NA-s.The classes of these columns are factor. First i want to change it to character, fill NA-s with "none" and change it back to factor. I have 14 columns and because of it i'd like to make it with loops. But it doesnt work.

Thx for your help.

The columns:

miss_names<-c("Alley","MasVnrType","FireplaceQu","PoolQC","Fence","MiscFeature","GarageFinish",       "GarageQual","GarageCond","BsmtQual","BsmtCond","BsmtExposure","BsmtFinType1",
          "BsmtFinType2","Electrical")

The loop:

for (i in miss_names){       
    hp[i]<-as.character(hp[i])
    hp[i][is.na(hp[i])]<-"NONE"
    hp[i]<-as.factor(hp[i])
    print(hp[i])
    }

 Error in sort.list(y) : 'x' must be atomic for 'sort.list'
 Have you called 'sort' on a list? 
dkantor
  • 162
  • 6
  • Please provide a reproducible example. Add a few lines of the hp object, ideally using `dput`. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Pierre Lapointe Mar 12 '17 at 20:08

2 Answers2

1

Use addNA() to add NA as a factor level and then replace that level with whatever you want. You don't have to turn the factors into a character vector first. You can loop over all the factors in the data frame and replace them one by one.

# Sample data
dd <- data.frame(
  x = sample(c(NA, letters[1:3]), 20, replace = TRUE),
  y = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE)
)

# Loop over the columns
for (i in seq_along(dd)) {
  xx <- addNA(dd[, i])
  levels(xx) <- c(levels(dd[, i]), "none")
  dd[, i] <- xx
}

This gives us

> str(dd)
'data.frame':   20 obs. of  2 variables:
 $ x: Factor w/ 4 levels "a","b","c","none": 1 4 1 4 4 1 4 3 3 3 ...
 $ y: Factor w/ 4 levels "A","B","C","none": 1 1 2 2 1 3 3 3 4 1 ...
Johan Larsson
  • 3,496
  • 18
  • 34
0

An alternative solution using the purrr library using the same data as @ Johan Larsson:

library(purrr)

set.seed(15)
dd <- data.frame(
        x = sample(c(NA, letters[1:3]), 20, replace = TRUE),
        y = sample(c(NA, LETTERS[1:3]), 20, replace = TRUE))

# Create a function to convert NA to none
convert.to.none <- function(x){
        y <- addNA(x)
        levels(y) <- c(levels(x), "none")
        x <- y
        return(x) }

# use the map function to cycle through dd's columns
map_df(dd, convert.2.none)

Allows for scaling of your work.

cephalopod
  • 1,826
  • 22
  • 31