0

I am trying to run mice while excluding (not imputed and not imputers) some cases (responses for 6 particular items (colums), for a particular group (1 country))? Is it possible to do this? Would anyone know how to solve this?

Thank you in advance.

A.

I tried to change the matrix but this doesn't work as I want to exclude some cases, not columns.

jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
  • Can you make your post reproducible? You can use the `nhanes` data built into the `mice` package. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – jrcalabrese Apr 23 '23 at 14:07

1 Answers1

0

I created a small reproducible example and assumed that your country variable (ie your grouping variable) contains no NA values.

You can use the where argument that is part of the mice function to specify where you do and don't want values to be imputed. You just need to create a dataframe of logical values (TRUE/FALSE) to specify where imputations should occur. You can use some dplyr manuiplation to make a copy of your dataframe where all values for the UK rows are FALSE, but everything else is TRUE.

library(faux) # to generate data
library(missMethods) # to make data missing
library(mice) # to impute
library(dplyr) # for pipes, mutate, and case_when

df <- 
  faux::rnorm_multi(n = 100, vars = 6, mu = 3, sd = 1, 
                        varnames = c("var1", "var2", "var3",
                                     "var4", "var5", "var6")) %>%
  mutate(country = sample(x = c("US", "UK", "FR", "CA", "JP"),
                          size = 100, replace = TRUE)) %>%
  missMethods::delete_MCAR(cols = c("var1", "var2", "var3",
                                   "var4", "var5", "var6"), p = .15)

# make a logical vector to say where you do and
# don't want imputing to occur
here <- df %>% 
  mutate(across(c(var1:var6), 
                ~ case_when(country == "UK" ~ FALSE,
                            country != "UK" ~ TRUE))) %>%
  mutate(country = FALSE)
head(here)
#>    var1  var2  var3  var4  var5  var6 country
#> 1  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE   FALSE
#> 2  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE   FALSE
#> 3  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE   FALSE
#> 4  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE   FALSE
#> 5 FALSE FALSE FALSE FALSE FALSE FALSE   FALSE
#> 6 FALSE FALSE FALSE FALSE FALSE FALSE   FALSE

# use the where argument
imp <- mice(df,
            maxit = 5,
            where = here,
            printFlag = FALSE)
#> Warning: Number of logged events: 1

# proof that uk rows still have some NAs
com <- complete(imp, action = "long", include = FALSE)
com %>%
  filter(country == "UK") %>% 
  summarise(across(everything(), ~ sum(is.na(.))))
#>   .imp .id var1 var2 var3 var4 var5 var6 country
#> 1    0   0   15   30   25   20   20   30       0

# proof that all the other countries dont have any NAs
com %>%
  filter(country != "UK") %>% 
  summarise(across(everything(), ~ sum(is.na(.))))
#>   .imp .id var1 var2 var3 var4 var5 var6 country
#> 1    0   0    0    0    0    0    0    0       0
jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
  • Thank you! However, isn't "where" used only to specify the values (cells) to be (or not to be) imputed but not the imputers? I want these cells not to be imputed but also not be used as imputers, and "where" only allows me to specify that I don't want these cells to be imputed. If I create a dataframe of logical values (TRUE/FALSE) to specify where imputations should occur, I still have some NAs for some cells I specified have to be imputed (I guess because some of my cells that I don't want to be used as imputers nor imputed are still being used as imputers and some of them are NAs). – user21683212 Apr 24 '23 at 15:09
  • You're right, `where` will avoid those cells from being imputed, not from being used as imputers. You can easily exclude certain columns from being used as imputers by altering the [predictor matrix](https://bookdown.org/mwheymans/bookmi/multiple-imputation.html#customizing-the-imputation-model-1), but I don't believe you can partially use a column as an imputer. Just to clarify, you want the UK cells to both **not be imputed** and **to be not used as imputers**, right? You will and should end up with some NA values if you don't want UK cells to be imputed. – jrcalabrese Apr 25 '23 at 12:10
  • Also, if you've used my code on your real data and you end up with NA values in places where values *should* be imputed, then please provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in your original question. – jrcalabrese Apr 25 '23 at 12:13