3

I need to loop over IDs in a dataframe to fill NA values in a column by attributing empty cells evenly between the last and first filled entry outside of the NA cells.

ID     Value    X     Y
1          A      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1          B      x     y
2          C      x     y
2         NA      x     y
2         NA      x     y
2         NA      x     y
2         NA      x     y
2          D      x     y

Which should be filled to this:

ID     Value    X     Y
1          A      x     y 
1          A      x     y 
1          A      x     y 
1          B      x     y 
1          B      x     y 
1          B      x     y 
1          B      x     y
2          C      x     y
2          C      x     y
2          C      x     y
2          D      x     y
2          D      x     y
2          D      x     y

In case of 2n NA values between observations, n is attributed to the last and n to the next. In case of 2n+1 values, n is attributed to the last and n+1 to the next.

I know I need to use na.locf from the zoo package which works well with a large database for filling in empty values based on the last non-empty cell, along with the fromLast argument to perform "next observation carried backwards". I cannot however structure a loop to account for an even or odd number of NA values, and use both of these together.

Using the tidyverse package,

> library(tidyr)
> library(dplyr)
> df %>% dplyr::group_by(test$id) %>% fill(Value, .direction ="downup") %>% dplyr::ungroup()

This fills in NA values in both directions but does not account for different border values for NA cells in a group.

shinama99
  • 51
  • 6

2 Answers2

4

Define interp which replaces each successive non-NA with successive integers, applies na.appro9x, rounds and replaces the resulting integers with the original values.

library(zoo)

interp <- function(x) {
   x0 <- ifelse(is.na(x), NA, cumsum(!is.na(x)))
   xx <- na.approx(x0, rule = 2)
   na.omit(x)[round(xx)]
}
transform(DF, Value = interp(Value))

giving:

   ID Value X Y
1   1     A x y
2   1     A x y
3   1     A x y
4   1     B x y
5   1     B x y
6   1     B x y
7   1     B x y
8   2     C x y
9   2     C x y
10  2     C x y
11  2     D x y
12  2     D x y
13  2     D x y

Note

It is assumed that the input is the following, shown in reproducible form.

Lines <- "ID     Value    X     Y
1          A      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1          B      x     y
2          C      x     y
2         NA      x     y
2         NA      x     y
2         NA      x     y
2         NA      x     y
2          D      x     y"
DF <- read.table(text = Lines, header = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • An error I am facing with this is when A == B, the missing NA values would not be filled in since these are the same level. – shinama99 Oct 15 '21 at 10:51
  • The answer did state that it was assuming that there are no duplicates. That was based on the example in the question in which that was true. Anyways I have revised it to allow them. – G. Grothendieck Oct 15 '21 at 12:49
1

I guess the simplest way is to use the function: na.locf: Last Observation Carried Forward if we are in zoo/time-series.

see: https://www.rdocumentation.org/packages/zoo/versions/1.8-9/topics/na.locf

Herr Student
  • 853
  • 14
  • 26