I'm using the R mice
package to impute random missing questionnaire item values for a few participants. The sum score of the questionnaire is later used in a multilevel model as predictor (level 2) of reaction times in a task (multiple trials, level 1), using brms.
I already tried two different approaches to create a mids object which includes all data and can later be used in brms_multiple
but none worked so far:
1.) I kept the data frames separate, imputed the item values in the questionnaire data frame, created a data frame in long format including the original data and all imputations (using the complete
function) and calculated the sum scores for each participant in each imputation (using rowSums
). Afterwards, I joined this long data frame with the level-1 reaction time data (using full_join
) and tried to convert it in a mids object (as.mids
). This was, however, not feasible given the multiple occurrences of .id which emerged due to the joining.
2.) I joined the data frames before imputation and tried to impute only the level-2 questionnaire by extending mice
with miceadds
. Here, I defined only the item scores as predictors via the predictor matrix, 2lonly.function
as method,the correct imputation function and ID as cluster variable. This resulted in Error in edit.setup(data, setup, ...) : `mice` detected constant and/or collinear variables. No predictors were left after their removal.
Did anyone experience similar issues and could solve them?
--- edit: here is a reproducible example for method 1 (my preferred one)
#So this is a fake dataset for the level 1 data:
data1 <- structure(list(participant = structure(1:20, .Label = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"), class = "factor"),
scale1 = c(20.5176893097081, 17.1907529978866, NA, NA, 23.0900118234823,
16.825451016666, 17.9720180052918, 28.4363035263208, 26.0191098441877,
26.1444447937135, NA, 25.091133563164, 10.3353758051478,
18.0322232007671, 14.1767794585022, 20.9102922916395, 20.6239907650613,
17.661597152285, 18.3255223659322, 18.9958533053766),
scale2 = c(23.8446274459682,
NA, 13.3562256053306, 8.52823315494693, 18.3034641524201,
17.1100738924451, 20.0295218831116, 15.6986473122548, 14.9647149797442,
32.1875950434602, 25.255823725488, NA, 15.2625337013248,
17.6354282904461, 5.86783073951034, NA, 16.3987924521716,
11.3574747700045, 18.3557569542574, 18.741406021827)),
row.names = c(NA,
-20L), class = "data.frame")
#This is for the level 2 data:
data2 <- structure(list(participant = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L,
14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L,
17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 18L,
18L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L,
20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L, 20L),
.Label = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"), class = "factor"),
RT = c(416, 389, 383, 411, 354, 404, 354, 433, 411, 408,
339, 368, 474, 407, 411, 366, 401, 427, 415, 376, 398, 393,
391, 483, 466, 427, 372, 380, 360, 383, 374, 412, 412, 394,
403, 387, 427, 383, 362, 402, 397, 445, 393, 407, 450, 381,
395, 428, 423, 423, 435, 404, 405, 426, 392, 408, 383, 371,
409, 422, 386, 412, 420, 353, 429, 350, 395, 428, 428, 437,
423, 475, 444, 369, 360, 429, 365, 379, 391, 446, 405, 360,
354, 399, 428, 403, 432, 392, 394, 448, 474, 411, 398, 373,
415, 333, 401, 395, 403, 429, 344, 426, 391, 394, 456, 371,
339, 409, 373, 389, 384, 408, 436, 359, 394, 440, 415, 418,
401, 379, 330, 452, 388, 388, 315, 389, 399, 403, 344, 441,
404, 409, 357, 369, 385, 385, 452, 370, 436, 371, 403, 459,
466, 408, 451, 393, 355, 362, 418, 440, 360, 377, 400, 390,
369, 414, 390, 368, 381, 387, 386, 415, 387, 374, 442, 405,
441, 395, 420, 431, 435, 438, 420, 412, 391, 408, 409, 413,
371, 447, 392, 385, 421, 377, 419, 437, 401, 392, 431, 491,
412, 399, 446, 408, 369, 387, 372, 428, 389, 401)),
row.names = c(NA,
-200L), class = "data.frame")
# run imputation on level 1 data
imputed <- mice(data1)
#create dataframe with all imputation + sum scores of scales (each participant)
data1_imputed <- complete(imputed, action = "long", include = TRUE)
data1_imputed$sumscore <- rowSums(data1_imputed[c("scale1", "scale2")])
# merge imputed level 1 data with level 2 data
data_all <- dplyr::full_join(data1_imputed, data2)
# try to create mids object with merged data - NOT WORKING
merged_imputed <- as.mids(data_all)```