I'm struggeling with imputation via mice
package to solve a NA problem in my data anlysis. I'm using lienar mixed models to calcultate inter class correlation coefficients (ICC's). in my final dataframe there are several control variables (as columns) that I use as fixed effects in the model.
in some columns there are missing values. I have no further Problems to impute the NA by the following commands:
imputation_list <- mice(baseline_df,
method = "pmm",
m=5) # "pmm" == predictive mean matching (numeric data)
df_imputation_final= complete(imputation_list)
But now my problem:
The ID's (persons in rows) are subgrouped in multiple groups (families). So I have to impute the NA's, all persons within one family having the same imputation.
In the following dataframe I have to make imputations.
df_test <- data.frame(ID=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
family=c(Gerrard, Gerrard, Gerrard, Torres, Torres, Torres, Keita, Keita, Keita, Suarez, Suarez, Kuyt, Kuyt, Carragher, Carragher, Carragher, Salah, Salah, Firmono, Firmino )
income_family=c(NA, NA, NA, 100, 100, 100, 90, 90, 90, 150, 150, 40, 40, NA, NA, NA, 200, 200, 99, 99))
So all members/persons ("1", "2", "3" & "14", "15", "16") within families: "Gerrard", and "Carragher" need imputation in the income_family variable and the imputed values must be the same for all the members of the family. Should look like this:
df_final <- data.frame(ID=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
family=c(Gerrard, Gerrard, Gerrard, Torres, Torres, Torres, Keita, Keita, Keita, Suarez, Suarez, Kuyt, Kuyt, Carragher, Carragher, Carragher, Salah, Salah, Firmono, Firmino )
income_family=c(55, 55, 55, 100, 100, 100, 90, 90, 90, 150, 150, 40, 40, 66, 66, 66, 200, 200, 99, 99))
I hope you know what I mean. Thx a lot !!