Fill empty cells between two values in column with last non empty cell and next non empty cell in R

Question

I need to loop over IDs in a dataframe to fill NA values in a column by attributing empty cells evenly between the last and first filled entry outside of the NA cells.

ID     Value    X     Y
1          A      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1          B      x     y
2          C      x     y
2         NA      x     y
2         NA      x     y
2         NA      x     y
2         NA      x     y
2          D      x     y

Which should be filled to this:

ID     Value    X     Y
1          A      x     y 
1          A      x     y 
1          A      x     y 
1          B      x     y 
1          B      x     y 
1          B      x     y 
1          B      x     y
2          C      x     y
2          C      x     y
2          C      x     y
2          D      x     y
2          D      x     y
2          D      x     y

In case of 2n NA values between observations, n is attributed to the last and n to the next. In case of 2n+1 values, n is attributed to the last and n+1 to the next.

I know I need to use na.locf from the zoo package which works well with a large database for filling in empty values based on the last non-empty cell, along with the fromLast argument to perform "next observation carried backwards". I cannot however structure a loop to account for an even or odd number of NA values, and use both of these together.

Using the tidyverse package,

> library(tidyr)
> library(dplyr)
> df %>% dplyr::group_by(test$id) %>% fill(Value, .direction ="downup") %>% dplyr::ungroup()

This fills in NA values in both directions but does not account for different border values for NA cells in a group.

In the case of 1 NA between two non-NA, do you take the non-NA from previous element — akrun, Oct 14 '21 at 17:25
No, for the next (attributing n and n+1 for the first and last half of NA values, so n=0) — shinama99, Oct 14 '21 at 17:30

G. Grothendieck · Accepted Answer · 2021-10-15T13:31:56.493

Define interp which replaces each successive non-NA with successive integers, applies na.appro9x, rounds and replaces the resulting integers with the original values.

library(zoo)

interp <- function(x) {
   x0 <- ifelse(is.na(x), NA, cumsum(!is.na(x)))
   xx <- na.approx(x0, rule = 2)
   na.omit(x)[round(xx)]
}
transform(DF, Value = interp(Value))

giving:

   ID Value X Y
1   1     A x y
2   1     A x y
3   1     A x y
4   1     B x y
5   1     B x y
6   1     B x y
7   1     B x y
8   2     C x y
9   2     C x y
10  2     C x y
11  2     D x y
12  2     D x y
13  2     D x y

Note

It is assumed that the input is the following, shown in reproducible form.

Lines <- "ID     Value    X     Y
1          A      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1          B      x     y
2          C      x     y
2         NA      x     y
2         NA      x     y
2         NA      x     y
2         NA      x     y
2          D      x     y"
DF <- read.table(text = Lines, header = TRUE)

An error I am facing with this is when A == B, the missing NA values would not be filled in since these are the same level. — shinama99, Oct 15 '21 at 10:51
The answer did state that it was assuming that there are no duplicates. That was based on the example in the question in which that was true. Anyways I have revised it to allow them. — G. Grothendieck, Oct 15 '21 at 12:49

score 1 · Answer 2 · answered Oct 15 '21 at 08:46

1

I guess the simplest way is to use the function: na.locf: Last Observation Carried Forward if we are in zoo/time-series.

see: https://www.rdocumentation.org/packages/zoo/versions/1.8-9/topics/na.locf

answered Oct 15 '21 at 08:46

Herr Student

853
14
26

Fill empty cells between two values in column with last non empty cell and next non empty cell in R

2 Answers2

Note