I created a small reproducible example and assumed that your country
variable (ie your grouping variable) contains no NA
values.
You can use the where
argument that is part of the mice
function to specify where you do and don't want values to be imputed. You just need to create a dataframe of logical values (TRUE
/FALSE
) to specify where imputations should occur. You can use some dplyr
manuiplation to make a copy of your dataframe where all values for the UK
rows are FALSE
, but everything else is TRUE
.
library(faux) # to generate data
library(missMethods) # to make data missing
library(mice) # to impute
library(dplyr) # for pipes, mutate, and case_when
df <-
faux::rnorm_multi(n = 100, vars = 6, mu = 3, sd = 1,
varnames = c("var1", "var2", "var3",
"var4", "var5", "var6")) %>%
mutate(country = sample(x = c("US", "UK", "FR", "CA", "JP"),
size = 100, replace = TRUE)) %>%
missMethods::delete_MCAR(cols = c("var1", "var2", "var3",
"var4", "var5", "var6"), p = .15)
# make a logical vector to say where you do and
# don't want imputing to occur
here <- df %>%
mutate(across(c(var1:var6),
~ case_when(country == "UK" ~ FALSE,
country != "UK" ~ TRUE))) %>%
mutate(country = FALSE)
head(here)
#> var1 var2 var3 var4 var5 var6 country
#> 1 TRUE TRUE TRUE TRUE TRUE TRUE FALSE
#> 2 TRUE TRUE TRUE TRUE TRUE TRUE FALSE
#> 3 TRUE TRUE TRUE TRUE TRUE TRUE FALSE
#> 4 TRUE TRUE TRUE TRUE TRUE TRUE FALSE
#> 5 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> 6 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# use the where argument
imp <- mice(df,
maxit = 5,
where = here,
printFlag = FALSE)
#> Warning: Number of logged events: 1
# proof that uk rows still have some NAs
com <- complete(imp, action = "long", include = FALSE)
com %>%
filter(country == "UK") %>%
summarise(across(everything(), ~ sum(is.na(.))))
#> .imp .id var1 var2 var3 var4 var5 var6 country
#> 1 0 0 15 30 25 20 20 30 0
# proof that all the other countries dont have any NAs
com %>%
filter(country != "UK") %>%
summarise(across(everything(), ~ sum(is.na(.))))
#> .imp .id var1 var2 var3 var4 var5 var6 country
#> 1 0 0 0 0 0 0 0 0 0