I have a medical treatment data where some condition indicators (i.e., columns) are only available for some rows but in actuality, the same condition should be categorically applied to all observations belonging to the same treatment (i.e., program
). Thus, filling NA appears to be straightforward (since they are all assumed to have the same value) but also not easy because when I applied the methods recommended by some previous threads (e.g., here and here), they seem to have problem with filling string value, as the code shown below.
Is there a fix to this?
df_example <- data.frame(patient = c("A", "B", "C", "A", "B", "C", "A", "B", "C"),
status = c("Active", NA, NA, NA, "Non-Active", NA, NA, NA, "Active"),
condition = c(NA, "I", NA, NA, "II", "II", NA, NA, "III"),
program = c(1, 1, 1, 2, 2, 2, 3, 3, 3))
# I want to fill all the NA cells for columns "status" and "condition" by each program, the values should be the same for obs belonging to the same program
library("dplyr")
library("zoo")
df_example %>% group_by(program) %>% transmute(status=na.locf(status, na.rm=FALSE))
# A tibble: 9 x 2
# Groups: program [3]
program status
<dbl> <fct>
1 1 Active
2 1 Active
3 1 Active
4 2 NA
5 2 Non-Active
6 2 Non-Active
7 3 NA
8 3 NA
9 3 Active