Merge imputed level 2 data (mice) with non-imputed level 1 data for multilevel analysis with brms

Question

I'm using the R mice package to impute random missing questionnaire item values for a few participants. The sum score of the questionnaire is later used in a multilevel model as predictor (level 2) of reaction times in a task (multiple trials, level 1), using brms.

I already tried two different approaches to create a mids object which includes all data and can later be used in brms_multiple but none worked so far:

1.) I kept the data frames separate, imputed the item values in the questionnaire data frame, created a data frame in long format including the original data and all imputations (using the complete function) and calculated the sum scores for each participant in each imputation (using rowSums). Afterwards, I joined this long data frame with the level-1 reaction time data (using full_join) and tried to convert it in a mids object (as.mids). This was, however, not feasible given the multiple occurrences of .id which emerged due to the joining.

2.) I joined the data frames before imputation and tried to impute only the level-2 questionnaire by extending mice with miceadds. Here, I defined only the item scores as predictors via the predictor matrix, 2lonly.function as method,the correct imputation function and ID as cluster variable. This resulted in Error in edit.setup(data, setup, ...) : `mice` detected constant and/or collinear variables. No predictors were left after their removal.

Did anyone experience similar issues and could solve them?

--- edit: here is a reproducible example for method 1 (my preferred one)

#So this is a fake dataset for the level 1 data:
  
data1 <- structure(list(participant = structure(1:20, .Label = c("1", 
                                                                 "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                                                                 "14", "15", "16", "17", "18", "19", "20"), class = "factor"), 
                        scale1 = c(20.5176893097081, 17.1907529978866, NA, NA, 23.0900118234823, 
                                   16.825451016666, 17.9720180052918, 28.4363035263208, 26.0191098441877, 
                                   26.1444447937135, NA, 25.091133563164, 10.3353758051478, 
                                   18.0322232007671, 14.1767794585022, 20.9102922916395, 20.6239907650613, 
                                   17.661597152285, 18.3255223659322, 18.9958533053766), 
                        scale2 = c(23.8446274459682, 
                                   NA, 13.3562256053306, 8.52823315494693, 18.3034641524201, 
                                   17.1100738924451, 20.0295218831116, 15.6986473122548, 14.9647149797442, 
                                   32.1875950434602, 25.255823725488, NA, 15.2625337013248, 
                                   17.6354282904461, 5.86783073951034, NA, 16.3987924521716, 
                                   11.3574747700045, 18.3557569542574, 18.741406021827)), 
                   row.names = c(NA, 
                                 -20L), class = "data.frame")


#This is for the level 2 data:

data2 <- structure(list(participant = structure(c(1L, 1L, 1L, 1L, 1L, 
                                                  1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
                                                  3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
                                                  4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
                                                  6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                                                  7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 
                                                  9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
                                                  10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 
                                                  12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 
                                                  13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 
                                                  14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 
                                                  16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 
                                                  17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 
                                                  18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 
                                                  20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L), 
                                                .Label = c("1", 
                                                           "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                                                           "14", "15", "16", "17", "18", "19", "20"), class = "factor"), 
                        RT = c(416, 389, 383, 411, 354, 404, 354, 433, 411, 408, 
                               339, 368, 474, 407, 411, 366, 401, 427, 415, 376, 398, 393, 
                               391, 483, 466, 427, 372, 380, 360, 383, 374, 412, 412, 394, 
                               403, 387, 427, 383, 362, 402, 397, 445, 393, 407, 450, 381, 
                               395, 428, 423, 423, 435, 404, 405, 426, 392, 408, 383, 371, 
                               409, 422, 386, 412, 420, 353, 429, 350, 395, 428, 428, 437, 
                               423, 475, 444, 369, 360, 429, 365, 379, 391, 446, 405, 360, 
                               354, 399, 428, 403, 432, 392, 394, 448, 474, 411, 398, 373, 
                               415, 333, 401, 395, 403, 429, 344, 426, 391, 394, 456, 371, 
                               339, 409, 373, 389, 384, 408, 436, 359, 394, 440, 415, 418, 
                               401, 379, 330, 452, 388, 388, 315, 389, 399, 403, 344, 441, 
                               404, 409, 357, 369, 385, 385, 452, 370, 436, 371, 403, 459, 
                               466, 408, 451, 393, 355, 362, 418, 440, 360, 377, 400, 390, 
                               369, 414, 390, 368, 381, 387, 386, 415, 387, 374, 442, 405, 
                               441, 395, 420, 431, 435, 438, 420, 412, 391, 408, 409, 413, 
                               371, 447, 392, 385, 421, 377, 419, 437, 401, 392, 431, 491, 
                               412, 399, 446, 408, 369, 387, 372, 428, 389, 401)), 
                   row.names = c(NA, 
                                 -200L), class = "data.frame")



# run imputation on level 1 data
imputed <- mice(data1)

#create dataframe with all imputation + sum scores of scales (each participant)
data1_imputed <- complete(imputed, action = "long", include = TRUE)
data1_imputed$sumscore <- rowSums(data1_imputed[c("scale1", "scale2")])

# merge imputed level 1 data with level 2 data
data_all <- dplyr::full_join(data1_imputed, data2)

# try to create mids object with merged data - NOT WORKING
merged_imputed <- as.mids(data_all)```

It's easier to help you if you include a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Sharing the data with `dput()` or some other way to copy and paste into R will increase your chances of getting a response. That being said, for method #2 make sure the variables used in imputation are numeric or factor. They can't be character, which is what commonly causes that error. You may want to look into [passive imputation](https://www.gerkovink.com/miceVignettes/Passive_Post_processing/Passive_imputation_post_processing.html) — TrainingPizza, Feb 23 '22 at 16:21
Thanks @TrainingPizza! All variables were numeric or factor so this shouldn't be the problem for method #2. I also created an example for method#1 and hope that someone has an idea on how I could adjust the code to make it work. — Statju, Mar 01 '22 at 13:19
Are you not creating a prediction matrix and making sure things like `participant` are excluded from the imputation model? When I run your code `participant` would be included which conceptually likely isn't correct and also causes problems for the imputation. — TrainingPizza, Mar 01 '22 at 19:40
Hi @TrainingPizza! Thanks for the quick reply. Obviously, I'm specifying a prediction matrix since I also have additional variables in the data frame which should not be used for imputation. For simplification, I left it out in the example. — Statju, Mar 02 '22 at 11:13

Merge imputed level 2 data (mice) with non-imputed level 1 data for multilevel analysis with brms

0 Answers0