11

My question is I have a dataframe m as below

y1 =c( rep("A",5),rep("B",5))
y2 = rep(c(1:5),2)
y3 = y2
y3[c(2,7,9)]=NA
m = data.frame(y1,y2,y3)

   y1 y2   y3
1   A  1    1
2   A  2 <NA>
3   A  3    3
4   A  4    4
5   A  5    5
6   B  1    1
7   B  2 <NA>
8   B  3    3
9   B  4 <NA>
10  B  5    5

I want to fill in the NA based on the closest non-NA value "in front of" this NA. My output should look like this:

   y1 y2   y3 y4
1   A  1    1  1
2   A  2 <NA>  1
3   A  3    3  3
4   A  4    4  4
5   A  5    5  5
6   B  1    1  1
7   B  2 <NA>  1
8   B  3    3  3
9   B  4 <NA>  3
10  B  5    5  5

Any idea about how to use dplyr to achieve this goal?

talat
  • 68,970
  • 21
  • 126
  • 157
MYjx
  • 4,157
  • 9
  • 38
  • 53
  • 1
    `locf` won't handle the heading missing values, the `nomb` won't handle trailling missing values... if you have a group all with missing values, what neither will work... – B.Mr.W. Nov 29 '14 at 21:18
  • good point, but we don't yet know whether those come up in the OP's context or not. Since they say "in front of" it sounds like they don't, but only the OP can say for sure. The solution below (and linked) does work for the OP's example. A slightly clunky solution would use `zoo::na.locf` twice, with and without `fromLast`. – Ben Bolker Nov 29 '14 at 21:21

1 Answers1

24

This may have been answered before, but I don't know if it's been answered in a dplyr context. zoo::na.locf() is your friend:

m %>% group_by(y1) %>% mutate(y4=zoo::na.locf(y3))
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • ooops, took about the same length of time to answer as to search for "[r] na.locf dplyr" on SO ... – Ben Bolker Nov 29 '14 at 21:19
  • thanks!! I never used `zoo` package before...that's why I never thought about that.. – MYjx Nov 29 '14 at 21:29
  • 3
    not suggesting that you should have searched (it only works if you know what to search for), but that I should have searched & marked as duplicate before I answered. – Ben Bolker Nov 29 '14 at 21:31