Conditional replacement of NA based on Data type

Question

I have a database of over 80 different variables in which most have NAs. Some of the variables are integers and some are factors. What I am trying to do is develop a function that: 1. Looks through my column list; 2. Identifies column type; 3. If datatype contained in the column is factor, function replaces NA with "Others"; 4. However, if datatype contained in the column is an integer, replace with the number 0. Any ideas? Thanks, guys.

Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — Sotos, Mar 29 '18 at 12:37

LAP · Answer 1 · 2018-03-29T12:50:26.190

FOO <- function(x){
  if(is.numeric(x)){
    x[is.na(x)] <- 0
  }
  if(is.factor(x)){
    x[is.na(x)] <- "Others"
  }
return(x)
}

Now just use lapply to loop over multiple columns of your data, e.g. df[1:10] <- lapply(df[1:10], FOO).

Note: This requires that the factor level "Others" is already present in every factor variable you want to change. If this is not the case, use

FOO <- function(x){
  if(is.numeric(x)){
    x[is.na(x)] <- 0
  }
  if(is.factor(x)){
    x <- as.character(x)
    x[is.na(x)] <- "Others"
    x <- as.factor(x)
  }
  return(x)
}

This might rearrange the order of the factor levels, though.

sbha · Answer 2 · 2018-03-29T14:46:08.517

Using the dplyr and forcats packages:

library(dplyr)
library(forcats)

# sample data frame
df <- data_frame(fac1 = as.factor(c('NY', NA, 'PA', 'MN', 'OH', 'TX', NA)),
                 int1 = as.integer(c(1,2,3,NA,NA,6,7)),
                 fac2 = as.factor(c('red', 'blue', NA, 'green', 'green', NA, 'yellow')),
                 int2 = as.integer(c(1,NA,3,4,5,NA,7)))

df %>% 
  mutate_if(is.integer, funs(replace(., is.na(.), 0))) %>% 
  mutate_if(is.factor, funs(fct_explicit_na(., na_level = 'Other')))

# A tibble: 7 x 4
    fac1  int1   fac2  int2
  <fctr> <dbl> <fctr> <dbl>
1     NY     1    red     1
2  Other     2   blue     0
3     PA     3  Other     3
4     MN     0  green     4
5     OH     0  green     5
6     TX     6  Other     0
7  Other     7 yellow     7

Conditional replacement of NA based on Data type

2 Answers2