I have a dataset with 50 columns and I would like to write a function that would assign a zero, 'none', or 99 (as I specify) to each of the 50 columns where NAs are present. I could write a line of code for each column (in my example below), but I thought there must be a way to do this with a function that would reduce the amount of code I need to write.
Here is an example with four columns.
set.seed(1)
dat <- data.frame(one = rnorm(15),
two = sample(LETTERS, 15),
three = rnorm(15),
four = runif(15))
dat <- data.frame(lapply(dat, function(x) { x[sample(15, 5)] <- NA; x }))
head(dat)
str(dat)
dat$two <- as.character(dat$two)
dat[["one"]][is.na(dat[["one"]])] <- 0
dat[["two"]][is.na(dat[["two"]])] <- 'none'
dat[["three"]][is.na(dat[["three"]])] <- 99
dat[["four"]][is.na(dat[["four"]])] <- 0
head(dat)
I thought a starting point would be to modify this function:
convert.nas <- function(obj,types){
for (i in 1:length(obj)){
FUN <- switch(types[i],character = as.character,
numeric = as.numeric,
factor = as.factor,
date = as.Date)
obj[,i] <- FUN(obj[,i])
}
obj
}
EDIT: Per suggestions/comments by others, I'll provide some additional context and clarification. I need to remove the NAs due to additional data manipulations (subscripting in particular) occurring later in my script. However, I do appreciate the point made by @Ananda about this making my data less usable. In regards to @Henrik's comment about the criteria between choosing 99 or 0, there is no actual 'criteria' in a logical sense, it is just specific to three columns that I need to define manually.
-al