7

I'm trying to use na.locf from package zoo with grouped data using dplyr. I'm using the first solution on this question: Using dplyr window-functions to make trailing values (fill in NA values)

library(dplyr);library(zoo)
df1 <- data.frame(id=rep(c("A","B"),each=3),problem=c(1,NA,2,NA,NA,NA),ok=c(NA,3,4,5,6,NA))
df1
  id problem ok
1  A       1 NA
2  A      NA  3
3  A       2  4
4  B      NA  5
5  B      NA  6
6  B      NA NA

The problem happens when, within a group, all the data is NA. As you can see in the problem column, the na.locf data for id=B comes from another group: the last data of id=A.

df1 %>% group_by(id) %>% na.locf()

Source: local data frame [6 x 3]
Groups: id [2]

     id problem    ok
  <chr>   <chr> <chr>
1     A       1  <NA>
2     A       1     3
3     A       2     4
4     B       2     5 #problem col is wrong
5     B       2     6 #problem col is wrong
6     B       2     6 #problem col is wrong

This is my expected result. The data for id=B is independent of what is in id=A

     id problem    ok
  <chr>   <chr> <chr>
1     A       1  <NA>
2     A       1     3
3     A       2     4
4     B       NA     5
5     B       NA     6
6     B       NA     6
Community
  • 1
  • 1
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
  • This looks like something similar to the [bug desribed here](https://github.com/hadley/dplyr/issues/1463). I got caught by this in [a recent answer](http://stackoverflow.com/a/43100751/496488) (you need to look at a previous version of the answer, as I later edited it to fix). – eipi10 Apr 04 '17 at 16:12
  • Yeah, it might be a bug. I'm just glad there's a workaround with `mutate_all` as per @Akrun's answer. – Pierre Lapointe Apr 04 '17 at 16:16

1 Answers1

12

We need to use na.locf within mutate_all as na.locf can be applied directly on the dataset. Eventhough it is grouped by 'id', applying na.locf by applying on the full dataset is not following any group by behavior

df1 %>%
     group_by(id) %>%
     mutate_all(funs(na.locf(., na.rm = FALSE)))
#    id problem    ok
#  <fctr>   <dbl> <dbl>
#1      A       1    NA
#2      A       1     3
#3      A       2     4
#4      B      NA     5
#5      B      NA     6
#6      B      NA     6
akrun
  • 874,273
  • 37
  • 540
  • 662