I need to loop over IDs in a dataframe to fill NA values in a column by attributing empty cells evenly between the last and first filled entry outside of the NA cells.
ID Value X Y
1 A x y
1 NA x y
1 NA x y
1 NA x y
1 NA x y
1 NA x y
1 B x y
2 C x y
2 NA x y
2 NA x y
2 NA x y
2 NA x y
2 D x y
Which should be filled to this:
ID Value X Y
1 A x y
1 A x y
1 A x y
1 B x y
1 B x y
1 B x y
1 B x y
2 C x y
2 C x y
2 C x y
2 D x y
2 D x y
2 D x y
In case of 2n NA values between observations, n is attributed to the last and n to the next. In case of 2n+1 values, n is attributed to the last and n+1 to the next.
I know I need to use na.locf
from the zoo
package which works well with a large database for filling in empty values based on the last non-empty cell, along with the fromLast
argument to perform "next observation carried backwards". I cannot however structure a loop to account for an even or odd number of NA values, and use both of these together.
Using the tidyverse package,
> library(tidyr)
> library(dplyr)
> df %>% dplyr::group_by(test$id) %>% fill(Value, .direction ="downup") %>% dplyr::ungroup()
This fills in NA values in both directions but does not account for different border values for NA cells in a group.