0

I have a medical treatment data where some condition indicators (i.e., columns) are only available for some rows but in actuality, the same condition should be categorically applied to all observations belonging to the same treatment (i.e., program). Thus, filling NA appears to be straightforward (since they are all assumed to have the same value) but also not easy because when I applied the methods recommended by some previous threads (e.g., here and here), they seem to have problem with filling string value, as the code shown below.

Is there a fix to this?

df_example <- data.frame(patient = c("A", "B", "C", "A", "B", "C", "A", "B", "C"),
                    status = c("Active", NA, NA, NA, "Non-Active", NA, NA, NA, "Active"),
                    condition = c(NA, "I", NA, NA, "II", "II", NA, NA, "III"), 
                    program = c(1, 1, 1, 2, 2, 2, 3, 3, 3))

# I want to fill all the NA cells for columns "status" and "condition" by each program, the values should be the same for obs belonging to the same program

library("dplyr")
library("zoo")

df_example %>% group_by(program) %>% transmute(status=na.locf(status, na.rm=FALSE))

# A tibble: 9 x 2
# Groups:   program [3]
  program status    
    <dbl> <fct>     
1       1 Active    
2       1 Active    
3       1 Active    
4       2 NA        
5       2 Non-Active
6       2 Non-Active
7       3 NA        
8       3 NA        
9       3 Active 
Arthur Yip
  • 5,810
  • 2
  • 31
  • 50
Chris T.
  • 1,699
  • 7
  • 23
  • 45

2 Answers2

3

Assuming that there is exactly one non-NA in each group:

df_example %>%
  group_by(program) %>%
  transmute(status = na.omit(status)) %>%
  ungroup

or if there are multiple non-NAs but all the non-NAs are the same:

df_example %>%
  group_by(program) %>%
  transmute(status = first(na.omit(status))) %>%
  ungroup

giving:

# A tibble: 9 x 2
  program status    
    <dbl> <fct>     
1       1 Active    
2       1 Active    
3       1 Active    
4       2 Non-Active
5       2 Non-Active
6       2 Non-Active
7       3 Active    
8       3 Active    
9       3 Active    
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • OP may be looking for - `df_example %>% group_by(program) %>% mutate_at(vars("status", "condition"), ~first(na.omit(.)))` – Shree Aug 08 '19 at 12:37
  • Hi, thanks for your reply. As with the solution recommended by @Sotos, when I check if all NA cells got filled up `length(df_example$status[is.na(df_example$status)])`, it returns 6. – Chris T. Aug 08 '19 at 12:38
  • @ChrisT. that could happen if some group(s) have all `NA`s. – Shree Aug 08 '19 at 12:39
  • But for my example data (`df_example`), that doesn't look right. – Chris T. Aug 08 '19 at 12:41
  • 1
    Assign the result to a variable and then look at that. It does NOT modify the input. I have shown the output so you can see it does work. – G. Grothendieck Aug 08 '19 at 12:41
  • Got it, I apparently have ignored this. – Chris T. Aug 08 '19 at 14:42
2

You also need to add na.locf with fromLast parameter, i.e.

library(dplyr)
library(zoo)

df_example %>% 
  group_by(program) %>% 
  transmute(status = na.locf(status, na.rm = FALSE), 
            status = na.locf(status, fromLast = TRUE))

# A tibble: 9 x 2
# Groups:   program [3]
#  program status    
#    <dbl> <fct>     
#1       1 Active    
#2       1 Active    
#3       1 Active    
#4       2 Non-Active
#5       2 Non-Active
#6       2 Non-Active
#7       3 Active    
#8       3 Active    
#9       3 Active    
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • thanks for your quick response. The code works great, but one thing I'm a little confused here is that when I check if all `NA` cells got filled up `length(df_example$status[is.na(df_example$status)])`, it returns 6. Weird! – Chris T. Aug 08 '19 at 12:35
  • Do you have any groups that only have NAs? – Sotos Aug 08 '19 at 12:38
  • No, they have at least one row that is not `NA`. – Chris T. Aug 08 '19 at 12:39
  • Also, if you enter `df_example`, R will return the same dataframe with those `NA` cells un-filled. – Chris T. Aug 08 '19 at 12:43
  • I did not assign it back to the `df_example`. You need to do `df_example <- df_example %>%....` Also, If all groups have at least 1 non-NA then I am not sure as to why some NAs remained unfilled. Share an example that exhibits this behavior please – Sotos Aug 08 '19 at 12:49
  • 1
    After I assigned the transmuted object to a new dataframe (using the original name `df_example`), the `length(df_example2$status[is.na(df_example2$status)])` did turn out to be 0. I just forgot that one need to assign the transmuted/mutated `dplyr` object to a new dataframe in order to get the transmuted/mutated columns. – Chris T. Aug 08 '19 at 14:57