I have a data set that I like to impute one value among others based on probability distribution of those values. Let make some reproducible example first
library(tidyverse)
library(janitor)
dummy1 <- runif(5000, 0, 1)
dummy11 <- case_when(
dummy1 < 0.776 ~ 1,
dummy1 < 0.776 + 0.124 ~ 2,
TRUE ~ 5)
df1 <- tibble(q1 = dummy11)
here is the output:
df1 %>% tabyl(q1)
q1 n percent
1 3888 0.7776
2 605 0.1210
5 507 0.1014
I used mutate
and sample
to share value= 5 among value 1 and 2 like this:
df1 %>%
mutate(q1 = case_when(q1 == 5 ~ sample(
2,
length(q1),
prob = c(0.7776, 0.1210),
replace = TRUE
),
TRUE ~ as.integer(q1))
)
and here is the result :
q1 n percent
1 4322 0.8644
2 678 0.1356
This approach seems working, however since I need to apply this for several variables I tried to write a function that working with tidyverse with tidyeval
, like this
my_impute <- function(.data, .prob_var, ...) {
.prob_var <- enquo(.prob_var)
.data %>%
sample(2, prob=c(!!.prob_var), replace = TRUE)
}
# running on data
df1 %>%
mutate(q1 = case_when(q1 == 5 ~ !!my_impute(q1),
TRUE ~ as.integer(q1))
)
The error is :
Error in eval_tidy(pair$lhs, env = default_env) : object 'q1' not found