I have a database of over 80 different variables in which most have NAs. Some of the variables are integers and some are factors. What I am trying to do is develop a function that: 1. Looks through my column list; 2. Identifies column type; 3. If datatype contained in the column is factor, function replaces NA with "Others"; 4. However, if datatype contained in the column is an integer, replace with the number 0. Any ideas? Thanks, guys.
Asked
Active
Viewed 128 times
1
-
2Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Sotos Mar 29 '18 at 12:37
2 Answers
1
FOO <- function(x){
if(is.numeric(x)){
x[is.na(x)] <- 0
}
if(is.factor(x)){
x[is.na(x)] <- "Others"
}
return(x)
}
Now just use lapply
to loop over multiple columns of your data, e.g. df[1:10] <- lapply(df[1:10], FOO)
.
Note: This requires that the factor level "Others"
is already present in every factor variable you want to change. If this is not the case, use
FOO <- function(x){
if(is.numeric(x)){
x[is.na(x)] <- 0
}
if(is.factor(x)){
x <- as.character(x)
x[is.na(x)] <- "Others"
x <- as.factor(x)
}
return(x)
}
This might rearrange the order of the factor levels, though.

LAP
- 6,605
- 2
- 15
- 28
0
Using the dplyr
and forcats
packages:
library(dplyr)
library(forcats)
# sample data frame
df <- data_frame(fac1 = as.factor(c('NY', NA, 'PA', 'MN', 'OH', 'TX', NA)),
int1 = as.integer(c(1,2,3,NA,NA,6,7)),
fac2 = as.factor(c('red', 'blue', NA, 'green', 'green', NA, 'yellow')),
int2 = as.integer(c(1,NA,3,4,5,NA,7)))
df %>%
mutate_if(is.integer, funs(replace(., is.na(.), 0))) %>%
mutate_if(is.factor, funs(fct_explicit_na(., na_level = 'Other')))
# A tibble: 7 x 4
fac1 int1 fac2 int2
<fctr> <dbl> <fctr> <dbl>
1 NY 1 red 1
2 Other 2 blue 0
3 PA 3 Other 3
4 MN 0 green 4
5 OH 0 green 5
6 TX 6 Other 0
7 Other 7 yellow 7

sbha
- 9,802
- 2
- 74
- 62