4

I am attempting to carry non-missing observations forward and populate the next two missing observations (although I imagine a solution to this problem would be broadly applicable to carrying observations forward through n rows...).

In the example data frame below I would like to carry forward (propagate) the flag_a and flag_b values for each id for two rows. Here is an example of my data with the desired output included:

id <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)
flag_a <- as.numeric(c(NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA,   NA, NA))
flag_b <- as.numeric(c(NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA))
flag_a_desired_output <- as.numeric(c(NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, NA, NA, NA))
flag_b_desired_output <- as.numeric(c(NA, NA, NA, 1, 1, 1, NA, NA, NA, NA, NA, NA, NA, 1, 1, 1, NA, NA))
data <- data.frame(cbind(id, flag_a, flag_b, flag_a_desired_output, flag_b_desired_output))

I have attempted to use the following last observation carried forward (LOCF) function; however, as expected it populates all missing rows rather than just the next two.

na.locf.na <- function(x, na.rm = FALSE, ...) na.locf(x, na.rm = na.rm, ...)
data <- transform(data, flag_a_locf = ave(flag_a, id, FUN = na.locf.na))
data <- transform(data, flag_b_locf = ave(flag_b, id, FUN = na.locf.na))

Any thoughts on how to go about this would be greatly appreciated.

Entropy
  • 378
  • 6
  • 16
  • What is `id` for? Is that relevant to your problem? – Rich Scriven Jun 06 '14 at 22:09
  • `id` is an identifier for each unique subject contained within the overall dataset. The reason I included it here is that if `flag_a` was to occur only one row before the end of the rows associated with that `id`, then I would not want the code to carry the observation from `id == 1` forward into the first row where `id == 2`. Does that make sense? – Entropy Jun 06 '14 at 22:16

2 Answers2

4

This is not the prettiest thing, but here's how I handle problems like this:

library(data.table)
data <- data.table(data)
data[, rowid:=1:.N, keyby = id]

## flag_a
data[, flag_a_min:=min(rowid[!is.na(flag_a)]), keyby = id]
data[, flag_a_max:=flag_a_min+2]
data[rowid <=flag_a_max & rowid >= flag_a_min, flag_a:=min(na.omit(flag_a))]

## flag_b
data[, flag_b_min:=min(rowid[!is.na(flag_b)]), keyby = id]
data[, flag_b_max:=flag_b_min+2]
data[rowid <=flag_b_max & rowid >= flag_b_min, flag_b:=min(na.omit(flag_b))]

## clean up
data[, c("rowid", "flag_a_min", "flag_a_max", "flag_b_min", "flag_b_max"):=NULL]

> data
    id flag_a flag_b flag_a_desired_output flag_b_desired_output
 1:  1     NA     NA                    NA                    NA
 2:  1     NA     NA                    NA                    NA
 3:  1      1     NA                     1                    NA
 4:  1      1      1                     1                     1
 5:  1      1      1                     1                     1
 6:  1     NA      1                    NA                     1
 7:  1     NA     NA                    NA                    NA
 8:  1     NA     NA                    NA                    NA
 9:  1     NA     NA                    NA                    NA
10:  1     NA     NA                    NA                    NA
11:  2     NA     NA                    NA                    NA
12:  2      1     NA                     1                    NA
13:  2      1     NA                     1                    NA
14:  2      1      1                     1                     1
15:  2     NA      1                    NA                     1
16:  2     NA      1                    NA                     1
17:  2     NA     NA                    NA                    NA
18:  2     NA     NA                    NA                    NA
Fojtasek
  • 3,492
  • 1
  • 26
  • 23
0

You can use the maxgap option of na.locf in the package imputeTS to only impute/fill NA gaps smaller than a certain size. (this solution would leave longer NA gaps untouched)

e.g.

library(imputeTS)
na_locf(input, maxgap = 2)

would only apply last observation carried forward (locf) to NA gaps that are smaller than 2 consecutive NAs.

2,3,NA,NA,NA,5,5 would just remain 2,3,NA,NA,NA,5,5

while

2,3,NA,5,5,5,5 would become 2,3,3,5,5,5,5

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55