1
df <- data.frame(replicate(10,sample(0:100,1000,rep=TRUE)))
eee <- as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE) ]))
View(eee)

This gives me a data frame with missing data.

If a variable in my current data frame has missing values, then I want to create two new variables. The first being a binary "yes" this was missing or "no" it wasn't missing. I want the second variable to be the same as the original, if the variable is not missing. If it is missing, I want to impute the mean of the original variable for my new column.

I'm not sure how to write the code to do this checking my whole data set instead of doing each variable individually.

Thank you for the help!

  • can you provide a reproducible example https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Bulat Sep 23 '19 at 19:44
  • Is `population.data$j` a single value or a vector of values? If you want to check if one NA is present in the column, please check: https://stackoverflow.com/questions/6551825/fastest-way-to-detect-if-vector-has-at-least-1-na In addition, – Chelmy88 Sep 23 '19 at 20:13
  • Now that I see your data, looks like you want to cover multiple columns. – markhogue Sep 23 '19 at 20:37
  • Yes, I need to check all the columns of the dataset – Ben Rossmiller Sep 23 '19 at 20:39

1 Answers1

0

I worked something out that is crude but effective.

df <- data.frame(replicate(10,sample(0:100,1000,rep=TRUE)))

eee <- as.data.frame(lapply(df, 
  function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE) ]))



replace_fn1 <- function(x) ifelse(is.na(x), "yes", "no")
pt1 <- apply(eee, c(1, 2), replace_fn1)


col_means <- as.data.frame(t(apply(eee, 2, mean, na.rm = TRUE)))

#set up df with same size of all column means

col_means <- as.data.frame(matrix(col_means, 
                          nrow = 1000, ncol = 10, byrow = TRUE))

pt2 <- pt1
na_ind <- which(is.na(eee), arr.ind = TRUE)
pt2[na_ind] <- col_means[na_ind]

markhogue
  • 1,056
  • 1
  • 6
  • 16