Apologies for the somewhat cumbersome question, but I am currently working on a mental health study. For one of the mental health screening tools there are 15 variables, each of which can have values of 0-3. The total score for each row/participant is then assigned by taking the sum of these 15 variables. The documentation for this tool states that if more than 20% of the values for a particular row/participant are missing, the total score should be taken as missing also, however if fewer than 20% of the values for a row are missing, each missing value should be assigned the mean of the remaining values for that row.
I decided that to do this I would have to calculate the proportion of NAs for each participant, calculate the mean of all 15 variables excluding NAs for each participant, and then use a conditional mutate statement (or something similar) that checked if the proportion of NAs was less than 20% and if so replaced NAs for the relevant columns with the mean value for that row, before finding the sum of all 15 variables for each row. The dataset also contains other columns besides these 15, so applying a function to all of the columns would not be useful.
To calculate the mean score without NAs I did the following:
mental$somatic_mean <- rowMeans(mental [, c("var1", "var2", "var3",
"var4", "var5", "var6", "var7", "var8", "var9", "var10", "var11",
"var12","var13", "var14", "var15")], na.rm=TRUE)
And to calculate the proportion of NAs for each variable:
mental$somatic_na <- rowMeans(is.na(mental [, c("var1", "var2",
"var3", "var4", "var5", "var6", "var7", "var8", "var9", "var10", "var11",
"var12", "var13", "var14", "var15")]))
However when I attempted the mutate() statement to alter the rows where fewer than 20% of values were NA I can't identify any code that works. I have tried a lot of permutations by this point, including the following for each variable:
mental_recode <- mental %>%
rowwise() %>%
mutate(var1 = if(somatic_na<0.2)
replace_na(list(var1= somatic_mean)))
Which returns:
"no applicable method for 'replace_na' applied to an object of class "list""
and attempting to do them all together without using mutate():
mental %>%
rowwise() %>%
if(somatic_na<0.2)
replace_na(list(var1 = somatic_mean, var2=
somatic_mean, var3 = somatic_mean, var4 = somatic_mean, var5 =
somatic_mean, var6 = somatic_mean, var7 = somatic_mean, var8 =
somatic_mean, var9 = somatic_mean, var10 = somatic_mean, var11 =
somatic_mean, var12 = somatic_mean, var13 = somatic_mean, var14 =
somatic_mean, var15 = somatic_mean ))
Which returns:
Error in if (.) somatic_na < 0.2 else replace_na(mental, list(var1 = somatic_mean, :
argument is not interpretable as logical
In addition: Warning message:
In if (.) somatic_na < 0.2 else replace_na(mental, list(var1 = somatic_mean, :
the condition has length > 1 and only the first element will be used
I also tried using if_else() in conjunction with mutate() and setting the value to NA if the condition was not met, but could not get that to work after various permutations and error messages either.
EDIT: Dummy data can be generated by the following:
mental <- structure(list(id = 1:21, var1 = c(0L, 0L, 1L, 1L, 1L, 0L, 0L,
NA, 0L, 0L, 0L, 0L, 0L, 0L, NA, 0L, 0L, 0L,
0L, 0L, 0L), var2 = c(0L,
0L, 1L, 1L, 1L, 0L, 0L, 2L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
2L, 0L, 1L, 1L), var3 = c(0L, 0L, 0L, 1L, 1L, 0L, 1L, 2L, 1L,
1L, 0L, 0L, 1L, 0L, 1L, 1L, 1L, 2L, 0L, 1L, 1L), var4 = c(1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, NA, 0L, 0L, 0L,
0L, 1L, 0L, 0L), var5 = c(0L, 0L, 0L, 1L, NA, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), var6 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), var7 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L, NA, 0L), var8 = c(0L,
0L, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), var9 = c(0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L), var10 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, NA, 0L, 0L, 0L,
0L, 0L, NA, 0L), var11 = c(1L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, NA, 0L), var12 = c(1L,
0L, 1L, 1L, NA, 0L, 0L, NA, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
1L, 0L, 1L, 1L), var13 = c(1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L,
0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, NA, 0L), var14 = c(1L,
0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 0L,
2L, 0L, 1L, 0L), var15 = c(1L, 0L, 2L, NA, NA, 0L, NA, 0L, 0L,
0L, 0L, 0L, NA, NA, 0L, NA, NA, NA, NA, NA, 0L)), .Names = c("id",
"var1", "var2", "var3", "var4", "var5", "var6", "var7", "var8",
"var9", "var10", "var11", "var12", "var13", "var14", "var15"), class =
"data.frame", row.names = c(NA,
-21L))
Does anyone know of code that would work for this sort of situation?
Thanks in advance!